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ABSTRACT 



The multi-hackend database system (HDDS) in the 
Laboratory for Database System Research at the Naval 
Postgraduate School is designed to overcome the 
performance-gain and capacity-growth problems of either the 
traditional database system or the single-backend-software 
database system. The original MBDS supported four primary 
operations, namely, RETRIEVE, DELETE, UPDATE and INSERT. 

This thesis presents the design and implementation of 
the fifth primary operation, the RETRIEVE-COHHO N operation. 
The retrieve-common operation is used to merge two files by 
their common attribute values. First, the overall design 
and inplementation of N3DS is reviewed. Then, several 
alternatives are compared and analyzed to select the best 
one as our design and implementation approach. Finally, we 
describe the detailed design and the implementation. Our 
goal is to maximize the utilization and minimize the effects 
to the existing system. 

For integrating our design into MBDS, several 

modifications are made. The algorithms for the 

modifications and their program specifications 
provided in Chapter IV, V and Appendices. 
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I. INTBODD CTI ON 



A. THE SCOPE OF THE THESIS 

A database, is a collection of stored operational data; 
and a database system is a computer-based system whcse 
overall purpose is tc record and maintain information (data) 
£Hef. 1 ], The traditional approach to manage the database 
system is to run the database system software as an 
application program in a mainframe computer system. The 
database system must share the use and the control of the 
mainframe computer resources with all of the other 
applications of the computer system. The performance of 
this approach suffers whenever there is an increase from 
either the usage of the computer system or the database 
applications. 

One solution to this problem is to offload the database 
system from the mainframe to a single, dedicated backend 
computer. The backend computer has its own disk storage and 
used to perform database operations exclusively. 
[Eefs« 2,3]. This approach is known as the sin gl e sof twar e 
back end approach . Eatabase systems based on this approach 
are referred to as s oftware singl e b ack e nd database s ystems . 
However, this approach still has the disadvantage, that is, 
performance upgrades will require the replacement of the 
backend and this may entail software modifications and 
hardware disruption [Eef. 4 : p. 4]. 

A second approach to solve the database performance 
problem is to develop a special-purpose database machine 
with specially designed hardware. However, the 

cost-effectiveness of this approach, known as the har dwar e 
tack end approach , has not yet been demonstrated [Eef. 5]. 
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In order to overcome the pertormance-gain and 
capacity-growth problems of cither the traditional database 
system or the single tackend software system, a research of 
a mu 1 ti-fcackend datalase system, known as MODS, is conducted 
in the Laboratory for Database Systems Research, at the 
Naval Postgraduate School. Instead of a single backend 
computer, MODS uses several identical (both in hardware and 
in software) minicomputers as its backend computers in a 
parallel fashion in order to gain performance gain and 
capacity growth. These backends with their respective disk 
systems are connected with another minicomputer, called the 
tackend controller. The controller is responsible for 
supervising the execution of parallel database operations on 
the backends and for interfacing with the hosts and the 
user. Users access the system either by way of the host or 
through the controller directly (as shown in Figure 1.1). 




Figure 1.1 The Hulti-Backend Database System IDS). 
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Ihe attribute-based data language (ABDL) [Ref. 6] is 
used as the basis of the data language of MBDS. Currently, 
ABDL supforts four primary database operations, RETRIEVE, 
DELETE, UPDATE and INSERT. The functions of these four 
database operations are shown in Figure 1.2. 



1 Operation 


Function | 


j RETRIEVE I 


Retrieve records from the database | 

1 1 


1 DELETE 1 

1 1 


j Delete records from the database j 


j OPIATE 1 


j Modify records of the database | 


INSERT j 


j Insert records into the database | 



Figure 1.2 The Functions of the 
Current BEDS Database Operations. 

In order to make MBDS a more complete database system, 
the fifth operation, the RETRIEVE-COMMON operation which is 
used to merge two files by common attribute values, has been 
proposed [Ref. 7]. This thesis will focus on the design and 
implementation of the RETRIEVE-COMMON operation of MBDS. We 
will propose several alternatives of the design and 
implementation strategies, then evaluate and analyze these 
alternatives based on the time complexities, the affects to 
the existing system and the design-goals of MBDS. According 
the results of the analysis, we will choose the best 
alternative to design and implement the fifth operation. 



B. TEE OBGANIZATION OF THE THESIS 



The rest of this thesis is organized as follows. In 
chapter II we give an overview of the architecture of the 
MBDS. We will descrite the design goals, the underlying and 
intended hardware, the process structure, the data model and 
the data language of MBDS. In chapter III, we first 
define the intended operation and the syntax of 
EETRIEVE_COMMON operation, and then evaluate and analyze the 
alternatives for the design and implementation. According 
to the analysis, we will select the best alternative to add 
the retrieve-common operation into the MBDS. In chapter IV, 
we present the details of the design for the selected 
approach. We also consider the possible effects of this 
approach to the existing system. In chapter V, we describe 
how to incorporate our design into MBDS. Our goal is to 
minimize the effects of the implementation. Finally, this 
thesis is summarized and concluded in chapter VI. It is 
hoped that this thesis will provide a definite help to the 
future work on MBDS. 
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II. TIE MLTrrBACKEND DATABA^ SYSTEM (MBDS) 



In this chapter we will briefly review the configuration 
and the theory of operations of the HBDS. Most of the 
information provided in this chapter has been extracted from 
[Refs. 4,7 : pp. 1-68, 7-20]. The interested readers are 

encouraged to refer to the references. 

A. TBE SYSTEM GOALS 

As mentioned in chapter I, MBDS is designed to overcome 
the performance prcbleos and upgrade issues of the 
traditional mainframe-based or the software single-backend 
database system. In ether words, the overall goal for MBDS 
is to prove that: 

(1) the system is easily extensible; and 

(2) the performance gain and improvement should be 

proportional to the multiplicity of processing and 
storage elements [Ref. 4 : pp.1-5]. 

In order to achieve the aforementioned goal, the design 
requirements and their correlated design issues for 
designing and implementing MBDS have been defined in [Ref. 7 
: pp. 7-10]. 

1 • D esign Reguirements 

There are three main design requirements for MBDS. 

(1) The system must be expandable. 

(2) Both the hardware and software are generic. 

(3) The database is evenly distributed across the disk 
systems of the backends, and, for operation, there are 
parallel and concurrent processing of transactions by 
the backends. 
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Ihe first twc design reguirements can support the 
addition of backends for performa. e enhancement and 
capacity growth by adding new backends of the same type and 
by using existing system software. With the third 
requirement, performance gain (in terms of response-time 
reduction) and capacity growth (in terms of response-time 
invariance) of the system are likely to be in proportion to 
the number of backends of the system. 

2 . Design Issue s 

There are several issues which must be resolved in 
order to meet the design requirements of HBDS. The first 
issue concerns the backend controller. As shown in Figure 
1.1, the controller may become a primary bottleneck of the 
system. In order to avoid this problem, the functions of 
the controller should be minimized and reduced to the 
pre-processing of the user transactions, the post- processing 
of the transaction results, the sending and receiving data 
between the backends and the host, and the arbitration of 
data insertion into the database. 

Ihe second design issue addresses the 
characteristics and functionality of the communication bus 
between the controller and the backends. The bus should be 
cost-effective and efficient for both backend communication 
and backend addition. 

The third class of issues involves the backends of 
the system. The backends must have identical software to 
allow replication of the software on a new backend. 
Additionally, the backends must have complete software to 
perform all of the database management functions. These 
functions include directory management, concurrency control, 
record processing and communication. 

The fourth design issue concerns the database. The 
database should be evenly distributed across all the disk 
systems of the backends. 
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Ihe fifth design issue is on the choice of a data 
model and data language. The data model should easily 
support the required data distribution and the data 

placement of the database. The data language for the system 
is of course based on the chosen data model. It must 

capture all of the primary operations of the database 
system. Ihe chosen data model is the attribute-based data 
model and the data language is the attribute-based data 
language. 

Ihe sixth design issue focuses on minimizing the 
communications traffic of the system. The controller should 
only communicate with the backends for sending the 
pre-processed user transaction, for arbitrating the data 
placement, and for receiving results. The backends should 
only communicate with the controller for sending the results 
ox the user transactions. Communication among backends 
should be held to a minimum. 

Ihe seventh issue deals with the directory placement 
strategies. In order to enable each backend to perform all 
the database management functions and minimize the 
communication among backends, the directory data are 
duplicated at each backend. 

B. IHE UNDERLYING AND INTENDED HARDWARE 

An overview of HEDS hardware organization is shown in 
Figure 2.1 User access is accomplished through a host 
computer which in turn communicates with the controller. 
When a transaction (either a request or a set of requests) 
is received, the controller will broadcast the transaction 
to all the backends. Since the data of all data files are 
evenly distributed across all the backends, all backends can 
now execute the same request in parallel. A queue of 
requests is maintained in each backend. When a backend 
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Figure 2.1 The HBDS Hardware Organization. 
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finishes executing one reguest it will send the results of 
that request to tte controller and be able to start 
executing the next request independent to the other tackend- 

Originally, HBDS is designed to be configured with a 
number of microprocessor-based processing units and their 
disk subsystems and be connected by a broadcast— based 
communications line. When the implementation of MBDS began, 
neither the microprocessor-based computers nor the 
broadcast-based communications devices were available, Ihe 
present MBDS is configured with a VAX- 11/780 (VMS OS) as 
both the host and the controller and two PDP-11/U4s (RSX-llM 
OS) and their disk systems as the backends. Communication 
between computers is accomplished by 
time-division-multiplexed buses, knowns as parallel 
communication links (PCLs) . The broadcasting bus is 
simulated by the PCL. 

Currently, MBDS is being down-loaded to an initial 
configuration of eight microprocessor-based, 
broadcast-bus-connected, and Winchester-drive-supported 
workstations, with cne of the eight being used as the 
contrcller and the others as the backends. This workstation 
(Sun-2/170, 4.2 BSD UNIX OS) has the Motorola MC68010 as the 
CPU with 16 mbytes of virtual space per process and uses 
Ethernet as the broadcast bus among workstations. The disk 
drives on the backends are Fujitsu Eagle Winchester-type 
drives, with a formated capacity of 380 mbytes per drive. 

C. THE DATA HODEL AHD THE DATA LANGUAGE 

In this section we will first introduce the concept and 
terminology of the attribute- based data model which is the 
data model used in MBDS, then describe the data language in 
which users may issue request to MBDS. 
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1. Attr ibute-based Data Mo del 

MBDS chooses the attribute-based data model to be 
its data model. In the attribute-based data model, data is 
modeled with the ccnstructs: database, file, record, 

attribute- value pair (keyword), directory keyword, 

directory, record body, keyword predicate, and query. 
Informally, a dat ab ase is a collection of files, each f ile 
contains a groups of records which are characterized by a 
unique set of directory keywords. A re c ord is composed of 
two parts. The first part is a collection of 

attribute-value pairs or keywords. An attr i but e- value pai r 

is a member of the Cartesian product of the attribute name 
and the value domain of the attribute. As an example, 
<SALAEY, 30000> is an attribute- value pair having 30000 as 
the value for the attribute SALARY. All the attributes in a 
records are required to be distinct. Certain 

attribute- value pairs of a record (or a file) are called the 
dire c to r y k eywor d of that record (file) , because either the 
attribute-value pairs or the ranges of their attribute 
values are kept in the directory for addressing the record 
(file). The rest of the record is textual information which 
is referred to as the recor d bodv. 

The angle brackets, <, >, enclose an attribute-value 
pair. The curly brackets, {, }, include the record body. 

The parenthesis, (, ) , form a record. The first 

attribute-value of all records of a file is the same. In 
particular, the attribute is FILE and the value is the file 
name. An example of a record of employee file is shown 
below : 

(<FILE, Employee>, <JCB, Mgr>, <DEPT,Toy>, <SALARY, 30000> 

{Employee Description}) 
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The record has four keywords and a record body of employee 
description. 

A keyw ord pr edicate . or simply pre dicat e . is of the 

form 

(attribute, relational operator, value) . 

Without confusion, we also use parenthesis to enclose a 
predicate. A r elati cnal operato r can be one of ( =, !=, <, 
=<, >=) . For example, (SALARY > 20000) is a predicate. A 
keyword K is said to satisfy a predicate T if the attribute 
of K is identical to the attribute in T and the relation 
specified by the relational operator of T holds between the 
value of K and the value in T. For example, the keyword 
<SALARY, 30030> satisfies the predicate (SALARY > 20000). 

A query consists of several keyword predicates in 
disjunctive normal form. An example of a yuery is: 

( (DEPT=Toy) and ( (SALARYO 0000) or (S ALARY>2 0000) ) ) . 

2- The At trib ute -based ^ta L anguag e 

The data manipulation language for MBDS, the 
attribute-based data language (A3DL) is a non-procedural 
language which originally supports four primary database 
operations; RETRIEVE, INSERT, DELETE and UPDATE. It is the 
purpose of this thesis to design and implement the fifth 
primary database operation, the RETRIEVE-COMMON operation. 

The RETRIEVE request is used to retrieve records of 
the database. The syntax of a RETRIEVE request is shown as 
below : 

RETRIEVE Query (Target- List) [EY Attribute] [WITH Pointer] 

The query specifies which records are to be retrieved. The 
target- list is a list of output attributes. It may also 
consist of an aggregate operators on one or more output 
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attritu 3S. MBDS sufports five aggregation operators, they 
are: A j, COUNT, SOM, MIN and MAX. The BY-clause and the 

WITH-clause are optional. The BY-clause may be used to group 
records when an aggregate operation is specified. The 
WITH-clause may be used to specify whether pointers to the 
retrieved records must be returned to the user or user 
program for later use in an update reguest. Some examples of 
retrieve reguest are shown in below. 

Example 1. Retrieve the names of all employees who work in 
the Toy department. 

RETRIEVE ( (FILE=Employee) and (DEPT=Toy) ) (NAME) 

Example 2. List the average salary of all departments. 
RETRIEVE (FILE=Employee) (AVG ( SALARY) ) BY DEPT. 

The INSERT reguest is used to insert a record into 
the database. The syntax of as INSERT request is: 

INSERT Record 

The following example will insert a record into the Employee 
file. 

INSERT (<EILE,Employee>, <SALAR Y, 300 00> , <DEPT, Toy>) 

The syntax of a DELETE reguest is; 

DELETE Query 

where the guery specifies the record (s) to be removed from 
the database. The following example will delete records from 
the Ettplcyee file. 

DELETE ( (FILE=Emc^oyee) and (S ALARY=30000) and (DEPT= Toy)). 
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Ihe UPDATE request is used to modify records of the 
database. The syntax cf the UPDATE request is; 

OPEATE Query <Modifier> 

where the query specifies the particular records to be 
updated from the database and the modifier specifies the 
kinds of modification that need to be done on records that 
satisfy the query. The following example will give a $1000 
raise to all employees. 

UPDATE (FILE=Employee) <SALARY=SALARY+1000> 

The RETRIEVE-COMMON request is used to merge two 
files by common attributes. It will be detailly discussed 
in the later chapters. 

D. THE PROCESS STRUCTURE 

MBDS is a message-oriented system. In a 

message-oriented system, each process corresponds to one 
system function. These processes communicate among 
themselves by passing messages. The processes are created at 
system start time and exist until the system is stopped. 
Figure 2.2 provides an overview of MBDS process structure. 

1 . The Communication Proc e sses 

Communication between computers in MBDS is achieved 
by using the PCL. MBDS provides a software abstraction to 
this bus for each computer in order to emulate broadcast 
capabilities. The abstraction consists of two complimentary 
processes. The first process, get-pcl, gets message from 
other computers off the PCL. The second process, put-pel, 

puts messages on the bus to be broadcasted to other 
computers. Every computer, whether it is the controller or a 
backend, has its own get-pcl and put-pel. 
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Figure 2.2 The MBDS Process Structure. 
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There are 31 message types and one general message 
format used in the MBDS message-passing facilities. The 
format (shown in Figure 2.3) is used for each of the three 
message-passing facilities, namely, messages within the 
controller, messages within the backends, and messages 
between computers. 



1 A Message | 


Data Type 


Message Type 


a numeric code 


Message Sender 


a numeric code 


Message Receiver 


a numeric code 


Message Text 


an alphanumeric field terminated 




by an end of message marker 



Figure 2.3 The General Format of HBDS Messages. 

Messages between computers are divided into two classes: 
messages between backends and messages between the 
contrcller and the backends. Figure 2.4 describes each of 
MBDS message types. 

2. The ^st In ter f ace P ro c es s 

The test interface process allows the user to 
interact with the MBES directly. Since MBDS does net use a 
host computer, the test interface process is contained in 
the controller. 

3- 2he Processe s of ^e Contr oller 

In addition to the communications and test-interface 
processes, the controller consists of three additional 
processes: Request Preparation (RP) , Insert Information 

Generation (IIG) and Post processing (PP) . EP receives. 



23 



parses and formates a request (transaction) before sending 
the formated request (transaction) to the 
directory-management process in each backend. IIG is used 
to provide additional information to the backends when an 
insert request is received. PP is used to collect all the 
results cf a request (transaction) and forward the results 
to the user. 

^ Pro c esses of Each Back end 

In addition to the ccmmunication processes, each 
tackend also consists of three other processes; Record 
Processing (RP) , Directory Management (DM) and Concurrency 
Control (CC) . 

DM controls the execution of a request at a backend 
and accesses the seccndary-storage-based directory tables. 
It determines the disk addresses where the relevant data of 
a particular request are stored and then sends those disk 
addresses to RP. 

CC is used to insure the consistency of the database 
while allowing concurrent execution of multiple requests. 

RP performs the disk I/O operations and other 
operations specified by the request. It receives the 
secondary-addresses from DM, which processes the request. 
Ihe results are then forwarded to the controller. 
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MESSAGE-TYPE NUMBER AND NAME 



I 

1 'mAFFIC UNIT 1 

REQUEST RESULTS 

NUfeER OF REQUESTS IN A TRANSACTION 
AGGREGATE OPERATORS 

5 REQUESTS WITH ERRORS 5 

PAfeED TRAFFIC UNIT 
NEW DESCRIPTOR ID 
BACKEND NUMBER 
CLUSTER ID 

10 REQUEST FOR NEW DESCRIPTOR ID 1(\ 

baJkend results for a request 

BACKEND AGGREGATE OPERATOR RESULTS 
RECORD THAT HAS CHANGED CLUSTER 
RESULTS OF A RETRIE'/E OR FETCH 
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Figure 2.^ The rtBDS Message Types. 
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III. DESIGN AND ANAIYSIS OF TH RET RIE VE-COMHO H BEfiDEST 



In this chapter, we introduce the terminology and 
notations of the "Retrieve-Common" request, investigate and 
analyze several possible design and implementation 
approaches, and then select the best one to design and 

implement the Retrieve-Common operation for MBDS. The 

selection of an approach is based on the design 

requirements and the design issues of MBDS. 

A. THE INTENDED OPEBATION 

1. An Operation Cn Two Files 

The RETRIEVE-COMMON request is used to merge two 
files by common attribute values. The c omm on attr ibut e 
valu es are the attribute values which belong to the records 
of both files. For example, suppose there are two files: 
file A and file B. File A contains the records of the 

street names of San Jcse city: 

(<FILB, A>, <STREET, MONTEEEY>, <CITY, SAN JOSE>) 

(<FILE, A>, <STREET, SZCOND>, <CITY, SAN JOSE>) 



File E consists the records of city names of the Monterey 
county: 

(<FILE, B>, <CITY, MONTEREY>, <COUNTY, MONTEREY>) 
(<FILE, B>, <CITY, SEASIDE>, <COUNTY, MONTEREY>) 



26 



The RZTRIEVE-COM HON reguest can provide us a third file, 
say, file C, with the information such as: "All the records 
of both files A and E, where the street name of the records 
in file A is identical to the city name of the records in 
file E. One of the records in file C which satisfy the 
request would be 

(<FILE, C>, <FILE, A>, <STEEET, MONTEREY>, <CITY, SAN JCSE>, 
<FILF, E>, <CITY, MONTEREY}, <COUNTY, MONTEREY>) . 

Logically, the retrieve-common request involves two 
retrieval operations. We define the first retrieval 
operation as the so urce retri eve and the second retrieval 
operation as the ta rget retrieve. The set of all the 
records that belong to the result of the source retrieve is 
called the sou rce r ecord set. The set of all the records 
that belong to the result of the target retrieve is called 
the t arget reco^ set, A s our ce (target) re cor d is the 
record that belongs to the source (target) record set. 
Similarly, those attributes will be refered as s ourc e 
att ributes and target attr i bute s , The merged source and 
target records are termed the result record set . The 
aforementioned file C is a result record set. 

We term the source and target attribute names that 
participate in the retrieve-common operation the j oin 
attribute names or briefly join at tributes . However, their 
values are termed co mmo n attri bu te value s, or simply commo n 
values. The retrieve-common operation requires that the 
join attribute which is specified in the source record set 
must have the same dcmain as that of the join attribute in 
the target record set, although they need not have the same 
attribute name. 

Consider another example, suppose the source records 
are characterized by the attributes, Employee_name, Wages, 
and the target records are characterized by Rank, Wages, 
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Further, let the domain of the Employee_nam€ be the 
character string and the domain of both Rank and Wages be 
the integer. A retrieve-common operation may be performed 
by merging on the attribute values of the wage of the 
respective source record and the target record. A 

retrieve-common operation may also be performed by merging 
on the wages of the source record and the ranks of the 
target record. Since their value domains are the same. 
However, a merge between the employee names and the ranks 
would net be permitted, since their domains are different. 

Ihe logical operation for the retrieve-common 
reguest can be described as follows. 



(1) 


All records 
collected. 


satisf ying 


the 


source 


retrieve are 


(2) 


All records 
collected. 


satisfying 


the 


target 


retrieve are 


(3) 


The records of 


the two collections are 


pairwise merged 



on the common (source and therefore target) attribute 
values. 



2- The ^ntax Of Re trieve-Commo n Requ es t 

When developing the syntax of the retrieve-common 
request, we must attempt to design a data language construct 
that is similar, syntactically, to the other primary 
operations of ABDI. In particular, the syntax of 
retrieve-common operation should resemble the syntax of the 
ABDL retrieve operation given below: 

RETRIEVE Query (Target- list ) [BY Attribute] [WITH Pointer] 

Using the above syntax as a guideline, we define the syntax 
for the retrieve-corn men request as follows. 

RETRIEVE Query-1 (Target-list- 1) [ BY Attribute ][ WITH Pointer] 
COMMOM (Attribate-1, Attribate-2) 
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BETRIEVE Query-2 (Target-list-2) [ BT Attribute ][ WITH Pointer] 

Ihe retrieve-common reguest consists of three parts. 
The first part is what we have referred to as the source 
retrieve request, which retrieves the source record set. 
The second part is the specification of the join attributes, 
where Attribute-1 belongs to the source record and 
Attrifcute-2 belongs to the target record. Although the 
values of these two attributes must be the same in order to 
satisfy the condition for merging the respective records, 
their attribute names need not be identical. The third part 
is what has been refered to as the target retrieve request, 
which retrieves the target record set. 

B. AS ANALYSIS OF DIFFERENT DESIGNS 

In order to make this thesis self-contained, several 
possible design approaches described in [Ref- 8] are 
reviewed in this section. 

The main issue when considering alternative strategies 
for implementing the retrieve-common request is where the 
merge of the source and the target records should be 
performed. 

There are three major alternatives for distributing the 
workload of the retrieve-common request. 

(1) The controller does all of the merge operation. 

(2) The backends do all of the merge operation. 

(3) The controller and the backends share the workload of 
the merge- 

Each of these alternatives will be analyzed and judged using 
the design requirements and design issues of MBDS. 

In order to simplify the analysis of design (or 
implementation) strategies, we make the following 
assumptions. 
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(1) The records of the source record set and the records 
of the target record set are distributed evenly across 
the backends. 

(2) The operation of the retrieve-common is performed as 
described in the previous section- 

The C ontrol ler D oes Al l the Merge Operat io n 

In this alternative, each backend only performs 
these two retrieval operations and then sends the records of 
source record set and records of the target record set to 
the controller. Upon receiving all the source records and 
target records from all the backends, the controller 
performs the merging operation and sends the results to the 
host computer. 

2 - The Con trolled A nd The B acj^n^ Shar e The Merg e 

Cperatio n 

Each backend performs the merge operation over its 
source records and target records. The merged records, along 
with the source and target record sets are then sent tc the 
controller. The controller performs the merge operation 
over the source and target record sets coming frcm different 
backends and then sends the results together with the 
previously merged records (done by individule backends) to 
the host. 

3 . The Backends Eo All the Merge Op eration 

This alternative may be further broken into two 
sub a Iter natives. 

(a) The backends share the merge operation. 

The backends send either source or target records to 
each other. let's assume that the target records are 
sent. Each backend will have a portion of the source 
record set and a whole set of target records. Then, 
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the backends perform the merge operation over its own 
source records and all of the target records, and 
sends the results to the controller. 

(b) One designated backend performs the merge operation. 
All records of both the source record set and the 
target record set are sent to the designated backend 
from all of the other backends. The designated 
backend performs the entire merge operation and sends 
the results to the controller. 

4 . An Analysi s of the D^ign Approaches 

Four alternatives of distributing the workload of 
the merge operation among the controller and the backends 
have been discussed in previous subsection. We now examine 
these alternatives with the design goals of MBDS. 

Alternative where the controller performs the 

entire merge operation will increase the workload of the 
controller. Recall that in chapter II we have stressed that 
in order to reduce the chance of the controller being the 
bottleneck of the system, we minimize the work of the 
controller. Alternative 1 violates this design reguirement 
Therefore, it will not be considered further. 

Alternative 2 will increase the communications load 
and increase the workload of the controller. This 
alternative complicates the first and the sixth design 
issues of M3DS. Therefore, it will also be eliminated from 
the design consideration. 

Alternative 3a meets the design issue of minimizing 
the controller function and distributing the workload to 
each backend evenly. Alternative 3b does not increase the 
workload of the controller; nor does it distribute the 
workload to each backend. Furthermore, transmitting all the 
records of both the source record set and target record set 



will increase the cciEunica tions overhead. In aiJition, 
performing the entire merge operation in one hacktr.d will 
unbalance the workload, thereby reducing the parallelism oi 
the backends, i.e. , by having a s in j j.o- Lac/^end tc dc the 
merge an 1 all other backends to idle. Ihxs compile ites noth 
oi the third and sixth design issues, so this alternative is 
also eliirir. ite l. 

iiith this analysis we c’loose the alternative 3a as 
our design api-roach. That is, e ac h backend 1 

libh its £ortign of source r eco rds and all 
target rec or ds. find then, sends its resu lt to the 

co ntrolle r. The cont roller for ward s the fi nal result to the 
bgst computer. 

C. AN ANALYSIS OF DIFFERENT IMPLEMENTATIONS 

Three different implementations tor merging the scarce 
and the tar jet record sets are considered. 

(1) A straivjhtf orward implementation. 

(2) An impiementat icn based cn sorting and matching. 

(3) An lEtlomentat ion based on DUcket-hashing . 

^ • lii® Straight forward It:;> Ie iaen t ati on 

The concept oi this alternative i^ very sim^-le and 
the merging operation is based on the "nest-loup" al,jOrithia 
[Eef. 8 : p. 86] which is shown in Figure 3.1. 

This alternative is accomi iisitod in live phases: 

(1) Each backend determines its own source records and 

stores them intc a predefined portion of the secon-dary 
storage area. 

(2) Each backend determines its own tai jCt recot is anl 

stores them into the prelefinod ^ortior. cf the 

secondary storage area. 
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PHCC EDDRE Nes t_loop_merge 

F OB each record in the source record set DO 
FOR each record in the target record set DO 
IF the merging condition is satisfied 
THEN 

form a result record 
END IF 
END FOR 
END FOR 

END FRO C EDDRE Nes t_looF_mer ge 



Figure 3.1 Ihe Nest-loop Herge Procedure. 

(3) Each backend broadcasts its own local target records 
to all of the ether backends. 

(4) Each backend receives the broadcasted target records 
from the other backends and stores them into the 
secondary storage together with its own target 
records. 

(5) Each backend brings its own source records and the 
entire target record set into the primary memory, 
performs the ’'nest-loop” merging operation and then 
send the merged results to the controller. 

2* Ihe Implementation ^ Sorting and Matchin g 

The idea of this implementation is based on the 
following inference. 

Since the retrieve-common operation is simply a merging 
operation on two files of records sets, if we can have 
these two files presorted by the values of their common 
attributes then the merging operation may be efficiently 
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performed by matching the values of the common 

attributes of the records of these two files. 

Ihere are two possible alternatives to perform the 
sort-match algorithm. 

(a) The backends do all of the sorting and matching 
operations. 

(b) The backends and the controller share the sorting and 
matching operations. 

Alternative (t) will increase the workload of the 
controller and contradict with the design goals of MBDS, and 
is therefore eliminated from consideration. Only 

alternative (a) will be examined. Alternative (a) 

accomplishes the retrieve-common operation in four phases. 

(1) Each backend retrieve, sorts and stores its own source 

records and target records separately, and then 
broadcasts either set of records to the ether 
backends. (Let's assume that the target records are 

transmitted.) 

(2) Each backend receives and merges the incoming 
ncn-local target records into its own local target 
records. 

(3) Each backend performs the matching operation over its 
own portion of source records and the entire set of 
target records (from all the backends) . 

(4) The backends send the results to the controller. 

3. ^e Implementation Based on Bucket - Hash i ng 

This implementation strategy attempts to speed up 
the comparison and merge by hashing records into small 
groups (the buckets of the hashing table) which contain 
records with common attribute values, so that the time 
complexity of the merging operation may be reduced. 
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A hashing function applied to the common attribute 
values is used to hash records into buckets. The bucket 
numbers are consecutive integers. Instead of using primary 
and overflow areas, the buckets use one or more fix-sized 
blocks to store records. The numbers of blocks may vary 
among buckets. Details of the hashing table, the buckets 
and the the blocks will be described in the next chapter. 

Those source records and target records within the 
same bucket will be examined and merged if the merging 
condition is matched. This alternative can also be broken 
to two sutalternat ives. 

(a) One common hashing table is used for both source and 
target record sets. 

(b) Twc separate tables are used, one for each record set. 

a. One Common Hashing Table 

This alternative is accomplished by each backend 
in four phases: 

(1) All local source records will be hashed and stored 
into blocks according to their hashed values. These 
blocks (therefore buckets) are termed source block s 
(tu ck ets) . 

(2) After all the local source records have been hashed, 
the local target records are hashed one at a time and 
buffered. If the target record is hashed into an 
empty source bucket, then it is buffered for 
transmitting to other backends. Otherwise, all the 
records in the source bucket will be retrieved and 
merged with that target record only if the merging 
condition is satisfied. The results are first 
buffered and then sent to the controller. 

(3) Since the non-local target records may arrive at a 
backend while the backend is processing some ether 
records, each backend will place these incoming 
records on a predefined secondary storage area. 
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(4) Each backend retrieves the non-local target records 
from the secondary storage area and processes them in 
the same way as the the backend does on its local 
target records. 

t. Separate Hashing Tables 

This alternative is accomplished in three 

phases. 

(1) The backends will hash and store their own source 
records and target records into two separate hashing 
tables by a common hashing function. After all of the 
target records have been hashed and stored, each 
backend will broadcast the hashed results of their 
target records {i.e. , the bucket number and the 
records associated with that bucket number) to all of 
the other backends. 

(2) Upon receiving all of the target information from the 
other backends, each backend stores those target 
records into appropriate buckets according to their 
bucket numbers. 

(3) The backends perform the merge operation on the local 
source records and the entire set of target records 
and send the results to the controller. The procedure 
is shown in Figure 3.2. 

4. A C om p arison Cf The Three I m p lemen t a tion Approaches 

In this section we compare and analyze these 
implementation approaches. Since the backends work in 
parallel, our analysis only focuses on how much time it 
takes for one backend to do one particular strategy. There 
are common operations that each backend performs, so that 
the time complexities for these operations can he ignored 
when comparing the implementation strategies. The times of 
these common operations are; 
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PRO CEDURE Hashiiig_merge 

F OR the bucket_value = min_value to max_value DO 
IF the buckets of both tables are not empty 
then 

retrieve all the records from both buckets 
perform merge operation based on 
the straightforward algorithm 

II 

EKD FOR 

i^D PB OCBDU RS Hashing_merge 



Figure 3-2 The Hashing_merge Procedure- 

{1) the time to process the records for the source request 
which involes determining which records of the 
database satisfy the query, projecting the 
attribute- value pairs of the target-list of the 

satisfied records and forming a source record set; 

(2) the time to process the records for the target 

request, which involes determining which records of 
the database satisfy the query, projecting the 

attribute- value pairs of the target-list of the 

satisfied records and forming a target record set; 

(3) the time to broadcast the local target records to the 
other backends; and 

(4) the time to send the merged results to the controller. 

The following notions are introduced to simply the ensuing 
analysis. 

Cs ; Cardinality of the source record set in one backend. 

Ct : Ccirdinality of the target record set in one backend. 

Cb : Average number of records in a bucket. 
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M : NumLer of Backends. 

B : Number of Index Entries in the hashing table. 

Ti ; Average time to read (write) a block of records from 
(to) secondary storage. 

Tb : Average time tc read (write) a record form (to) a 
bucket. 

Tc ; Average time to compare the common attribute values 
of two records. 

Th : Average time tc hash a record. 

Tm : Average time tc merge two records. 

a. An Analysis for the Straightforward 

Implementation 

Ne recall that there are five phases in this 
implementation as discussed in a previous section. 

Phase 1: Since there are Cs local source 

records in each backend, the time complexity for storing 
them into the secondary storage is: 

Ti* (Cs/Cb) . 

Phase 2: Since there are Ct local target 

records in each backend, the time complexity for storing 
them into secondary storage is: 

Ti* (Ct/Cb) . 

Phase 3: The time complexity for this phase is 

ignored. 

Phase 4: Since each backend receives (M-1) *Ct 

target records from the other backends, the time complexity 
for storing them in the secondary is: 

(M-1) ♦ (Ct/Cb) *Ti. 

Phase 5: Records are merged in this phase. 

There are Cs source records and M*Ct target records in each 
backend. Each block of the source records is compared and 
merged with all of the target records. It takes Ti to bring 
one block of source records into the primary memory from the 
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secondary storage and M*(Ct/Cb)*Ti for the entire target 
record set. 

It takes Cb*Tb to access one block of source 
records and H*Ct*Tb to access all of the target records. 
Ihe time complexity for comparing one block of the source 
records and all of the target records is 

Cb*H*Ct*Tc. 

¥e further assume that there are k fraction of target 
records participating the merging operation. The time 
complexity for merging one block of source records and all 
of the target records becomes: 

k*H*Ct*Tm. 

The total time complexity for processing one block of source 
records of this implementation is: 

[ Ti + M* (Ct/Cb) ]+ (Cb + K*Ct*Cb) *Tb+ (Cb*M*Ct*Tc) + (k*M*Ct*Tm) . 

There are Cs/Cb blocks of source records in each 
backend; therefore, the time complexity of this alternative 
is; 

(Cs/Cb) ♦ {[ Ti+M* (Ct/Cb) ]+ (Cb + H*Ct *Cb) *Tb 
+ (Cb*M*Ct*Tc) + (k*M*Ct*Tm) } 
or 

(M*Cs*Ct) ♦[ Ti+ ('Ib+k*Tm) /Cb+Tc ]+Ti* (Cs/Cb) +Cs*Tb 
Because Cs may be equal to Ct and H is a small constant, the 
time complexity may be further simplified to be 

0(Cs*Ct) or 

0 (Cs2) , 



b. An Analysis for the Sort-Hatching Implementation 

We will analyze each phase of this 
implementation approach. 

Phase 1: Each backend sorts its two record sets 

and broadcasts the sorted target record set to the ether 
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backends. Due to the -irge size of records, the sorting 
operation can not be done by using an internal sorting 
algorithn. There are several external sorting algorithms 
which can sort the Iccal source records and the local target 
records with the time complexities of 0 (Cs* (logCs) ) and 
0(Ct*(log Ct)), respectively. However, these algorithms all 
have some limitaticns: either using special hardweire 
configuration or running different software among processors 
[Refs. 9,10]. 

Because we do not want to put limitaticns on the 
hardware configuration of MBDS and to use different software 
among the backends, this alternative is eliminated from our 
consideration. 

c. An Analysis for the Bucket-Hashing 

Implementation 

In order to further simplify our analysis, we 
assume that the local source records and target records can 
be evenly hashed across all the buckets of the hashing 
tables and each bucket will contain only one block of local 
source records or one block of local target records. First, 
we analyze the alternative that uses only one hashing table. 

Phase 1; Each source record needs to be hashed, 
written into a bucket by its hashed value. This includes 
getting the block of that bucket from the secondary storage 
and writing the record into the block and returning the 
block to the secondary storage. Therefore, the time 
complexity for each backend to hash and store the source 
records is: 

Cs*(Th +Tb + 2Ti). 

Phase 2; Every time a target record is hashed, 
the bucket with that hashed value is checked. If the bucket 
is not empty, then all the source records in that bucket 
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will te retrieved into the primary memory, compared with the 
target record and merged with it if their common attribute 
values are equal. Ihe time complexity for bring one bucket 
(block) of source records intc primary memory is Ti . Ihe 
time complexity for accessing those source records from the 
block and comparing with that target record is: 

Cb * (lb + Tc) . 

Suppose that the probability of hashing a target record into 
a non-empty bucket is p and the probability of satisfying 
the merging condition is f, then the time complexity for 
each backend to process one local target records is: 

Th + p * [Ti + Cb * (Tb + f * Tc) ]. 

Because we assume the source records are evenly hashed 
across the buckets of the hashing table, p is egual to 1. 
There are Ct local target records in each backend so that 
the time complexity for each backend to process its local 
target records is: 

Ct* {Th+[Ti+Cb* (Tb + Tc + f *Tm) ]} . 

Phase 3: Each backend receives (M-1)*Ct target 

records from other backends. The time complexity for 
storing these records back to the secondary storage is: 

(14-1) * (Ct/Cb) *Ti. 

Phase 4: It takes (M- 1) * (Ct/Cb) for each backend 
to retrieve all the non-local target records from the 
secondary storage into the primary memory. The time 
complexity for processing these records is: 

(M-1) *Ct*{Th+[Ti+Cb* (Tb + Tc+k*Tm) ]} . 

Ihe time complexity of this phase is: 

(M-1) ♦ (Ct/Cb) ♦Ii+H*Ct {Th+[ Ti + Cb (Tb + Tc+f*Tm) ]} . 



The total time complexity of this alternative 
for a backend is: 

Cs (Th + Tt*2Ti) +2 (W-1) * (Ct/Cb) *Ti 
+ H*Ct {Th+[ Ti+Cb (Tb+Tc+f*Tm) ]} . 

Now, we analyze the other alternative which uses 
two separate hashing tables. 

Phase 1: The source records and the target 

records will be hashed, grouped into the buckets of separate 
hashing tables and then placed onto the secondary storage. 
The time complexity for each backend to process its local 
records is: 

(Cs + Ct) * (Th+Tb+2Ti) . 

Upon receiving the target records from the other 
backends, each backend will insert those incoming records 
into the hashing table of the target records and stored them 
back to the secondary storage. Since those non-local target 
records are grouped and sent by their bucket numbers, the 
insertion time is so quick that it may be ignored. By using 
an inverted list, the time complexity for each backend to 
return those incoming target records to the secondary 
storage is: 

(M-1) ♦ (Ct/Cb) *Ti. 

Phase 2: Records of these two hashing tables 

will be processed one bucket at a time. For any bucket 
number (i.e., a table entry), if the buckets of both hashing 
tables are not empty, then all blocks of the records of both 
buckets will be read into the primary memory for the merging 
operation. It takes Ti for bringing one bucket of source 
records (in this case, one block) into the primary memory 
and H*Ti for one bucket of target records (H blocks). The 
time complexity for accessing, comparing and possibly 
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merging one bucket cf source records with one bucket (M 
blocks) of target records (not including the disk I/O time) 
will be; 

Cb+flb+M+Cb* {Ib+Tc+f+Tm) ]. 

Ihe expected time complexity for all buckets will be: 

(Cs/Cb) *Cb*[Tb + M*Cb* (Tb+Tc+f*Tm) ] 

Therefore, the total time complexity for this alternative 
is; 

(Cs+Ct) (Ih+Tb+2Ti) + (M-1) * (Ct/Cb) *Ti 
+ (Cs/Cb) =»Cb*[ Tb + M*Cb* (Tb+Tc+f *Tm) ] 
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Figure 3.3 The Time Complexities of the 
Bucket-Hashing Implementations. 

A summary of the time complexity in terms of Th , 
Ti, Tb, and Tc for these two subalternatives is shewn in 
Figure 3.3. As shown in Figure 3.3, alternative which uses 
two separate tables is better than the other one which 
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employs only one table. Since Cb and M are constants, f is 
smaller than 1 and Ct may be equal to Cs, we can further 
simplify the the time complexity of the two-separate- tables 
subalternative to be: 

0(Cs+Ct) or 
0 (Cs) . 



d. The Conclusion for Our Implementation Approach 

A summary of the analysis for those 
implementation approaches in terms of time complexity are 
shown in Figure 3.4. Clearly, the one based on 
Bucket-Hashing with two separate hashing tables is the best 
approach. Therefore, our implementation will be based on 
that approach. The details of design and implementation 
will be discussed in the next chapter. 
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Figure 3.4 Time Cciplexity of Different Implementation. 
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IV. DEX&ILED DESIGN FOR IMPLEMENTING B ETR IE7 E-CCMMON 

^EB^ION INTO M BPS 

In the previous chapter, a bucket-hashing based 
implenentation approach has been selected for implementing 
the retrieve-common operation into M3DS. In this chapter, we 
focus on specifying the details of that approach and discuss 
any of the existing MEDS software which will be affected by 
this i cplemen ta tion . Our primary goal is to use the 

existing software as much as possible and to minimize the 
effects which may be caused by the implementation. 

The operations cf the retrieve-common request may be 
described in four phases. First, the user's request must be 
preprocessed so that all backends can be informed by an 
appropriate message. This is the r equest- prepro ce ssing 

phase. Second, the records of both the source and the 

target record sets are retrieved before the merging 
operation. This is the record- r et r ievin g phase. Third, 
those retrieved records are hashed on the values of their 
join attributes and stored into a hashing table according to 
their hashed values (i.e., the bucket numbers). We recall 
that there are two hashing tables, one for the source 
records and one for target records. Further, the hashed 
local target records are broadcasted to the other backends. 
This is the ha shin g-and-st or ing p has e. Lastly, hashed 
records of source tuckets and hashed records of target 
buckets are compared and merged bucket-by-bucket, 
respectively. The merged results are sent to the controller 
from all of the backends. This is the merging p has e . The 
controller then forwards those results to the host computer. 

The operations of the first and second phases can be 
done by the existing system software with minor 



45 



ications. 



in order 



mo ications. However, in order to accomplish the 
ope ’-.ions of the last two phases, we must design a new set 
of procedures, which we have referred to as the ha shin g 
modu le. In the remainder of this chapter, we first describe 
the hashing module, and then the operations of those four 
phases. 



a. TBE EASHIHG MODOIE 

This module is designed to accomplish the operations of 
the last two phases cf the retrieve-common request. There 
are three procedures within this module. They are: the 

hashing procedure, the bucJcet-block tracking procedure and 
the merging procedure. In this section, we first discuss 
the two different alternatives for implementing this 
module. After choosing the tetter alternative, we then 
describe the three procedures of the hashing module. 

Alternativ es for Impl em en t ing the Ha shing M od ule 

There are two alternatives that may be used for 
implementing the hashing module. In the first alternative, 
the hashing module is implemented as a separate process of 
the backend. This alternatives modifies the existing 
process structure of a backend by introducing a sixth 
process and its associated communication paths into each 
backend. In the second alternative, the hashing module is 
implemented as part of the existing record processing 
process (EECP) . This alternative leaves the existing backend 
process structure unchanged. 

a. As a Separate Process 

In this alternative, the hashing module is 
designed as a separate process of the backend. The inputs 
to the hashing module are either the local source or target 
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records froia the local EEC? or the other target records from 
the EECFs of the other backends. The outputs from the 
hashing module are the merged results, which are sent to the 
controller. The transfer of records between processes 
(i.e., non-local target records from "Put Pci" to the 
hashing module or the local source records or the local 
target records from the local EECP to the hashing module) is 
accomplished using the interpiccess message capabilities of 
each backend. The new process structure of each Laekend 
with the additional ccmmunicat ion paths is shown as Fig 4.1. 
Since the hashing module is an independent process, the 
effects of this implementation on the other processes of 
MEDS may be minimized. 



/ 


■V 






Each Backend 

_ _ _ i 


/L 



Put Pci I Get Pci 




Figure 4. 1 Hashing nodale As a Separate Process. 
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fc. As a Procedure within Record Processing 

In this alternative, the hashing module is 
designed as a group cf procedures that are added to FECP. 
In Figure 4.2 we show tht; structure of the hashing module 
with EECF. The local records (both the source records and 
the target records) are retrieved by the physical data 
operation of RECP of each lackend. Once the records are 
retrieved, they are sent to the hashing module. The 
non-lccal target records are received by RECP from the ether 
tackends and then passed to the hashing module. The merged 
results are then sent to the controller. With modularized 
programming, the hashing module may be independently 
implemented with a minimal effect on the original RECP 
software . 



F.ECP of Each Backend 



Aggregate 

Operation 



Physicd 1 
Data 

Operation 
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Local Records 






Hashing Module 



Figure 4.2 Hasing Module as Part of RECP. 
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c. Comparison of These Two Alternatives 

Both alternatives can be easily implemented with 
minimal effect on the existing system. The difference 
between these two alternatives is the way that the local 
records are passed from the ’’physical data operation" to the 
hashing module. In alternative (a), the records are passed 
as an interprocess message. In alternative (b) , the records 
are passed as a paraneter of a procedure call. We choose 
alternative (b) for three reasons. 

(1) The message-passing between two processes within a 
backend is slower than the parameter-passing. In 
message-passing, both processes have to access a 
common memory to put (or get) message. The accessing 
time coupled with the time required to place a 
message in the common memory by the sender and fetch 
the message from the common memory by the receiver is 
considerable. In parameter- passing , only the logical 
address of the record buffer is passed between the 
procedures, which is much simpler and faster. 

(2) Even if message-passing within a computer is extremely 
fast, there is a large number of messages (i.e., 
records) which is considerable. Since it amounts to 
route the messages (records) between two processes. 

(3) The extra communication paths required by alternative 

(a) (i.e., the communication paths among the hashing 

module and the other MBDS processes) , increase the 
number of messages passed within a backend and among 
backends. By increasing the inter-backend and 
intra-backend communication, we nay adversely effect 
the overall performance of a backend. 
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2 . The Hashing Pr ocedur e 

This procedure is used to perform the hashing 
operation on the values of the join attributes of the input 
records. The inputs to the procedure are either the local 
source records or the local target records, which are 
received from the physical-dat a-operation subprocess of 
RECP. The output from the procedure are the input records 
and their hashed values (i.e., the bucket numbers), which 
are sent to the bucket-block tracking procedure with the 
request id for further processing. 

The hashing operation is done by the hashing 
functions of this procedure. Since the type of the values 
of the join attributes may either be an integer or a 
character string, we have designed two hashing functions in 
this procedure. Generally, a good hashing function should 
satisfj the following three requirements: 

(1) All of the records should be evenly distributed into 
buckets of the hashing table; 

(2) The chance of hashing different records into the same 
tucket should be minimized; and 

(3) The hashing computation should be fast. 

These requirements are closely related to the number of 
buckets and the hashing algorithm which is used in the 
hashing function. 

a. The Number of the Euckets 

A hashing table with a large number of buckets 
is useful for a number of reasons. First, the large number 
of buckets may reduce the chance of hashing different 
records into the same buckets. Second, the number of 
records in each bucket is also quite small, and this will 
reduce the access time during merging. However, it would be 
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impractical to have a table with a very large number of 
bucket entries, where each bucket would only contain a few 
records. When the table becomes exceedingly large, a 
substantial cost is incurred to maintain the bucket index. 
The b ucket index of a hashing table is an array of 
fixed-size bucket entries. There is a bucket entry for each 
bucket to keep track cf the records which are stored in that 
bucket. Therefore, the number of buckets (and therefore the 
tucket entries) can be computed by the following equation: 

let X be the size of the bucket index (measured in bytes) , 

I be the size of a bucket entry (measured in bytes) , 
then the number cf buckets is (X / Y) . 

For example, if the size of bucket index of a hashing table 
is 8K bytes and the size of each bucket entry is 8 bytes 
then the number of bucket entries for that hashing table is 
Ik, i.e., 1024. 

How should we ' determine the size of the bucket 
index cf our hashing table? Since HBDS allows the 
concurrent execution cf different user transactions, there 
may be a number of retrieve-ccmmon requests being processed 
by the system. Each of the retrieve-common requests 
requires two hashing tables, one table for the source record 
set and one table for the target record set. Because of the 
potentially large number of hashing tables concurrently in 
use, it will be necessary to store the bucket indexes cf the 
tables in the secondary storage and stage them into the 
primary memory on demand. To minimize and optimize the size 
of the bucket index of the hashing table, it is desirable to 
have the size of the tucket index as a multiple of the unit 
of disk I/O transfer. For example, if the unit of disk I/O 
transfer (which is typical the track size) is 4K bytes, then 
the size of the bucket index shall be M*4K bytes, where II = 
{1, 2, 3, ...}. In cur case, we choose 16K bytes to be the 
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size of our hashing table, yielding 2048 entries (therefore, 
2048 tuckets) in the hashing table each with a bucket entry 
size of 8 bytes. 

b. The Hashing Algorithm 

Since the value type of the join attribute may 
te either an integer cr a character string, we have designed 
two hashing functions, one for each value type. 

(1) The Hashing A Igor it hm for the 

Integer- Valued Attributes . In or ' r to evenly distribure 
the values of all join attribute. i.nto the buckets and to 
minimize the collisions; we use .le information about the 
maximum and minimum values of the join attributes. These 
information is maintained in the record templates. The 
hashing algorithm for the integer attribute value is 
described as follows. 

Step 1: Get the MAX (maximum) and MIN (minimum) values of 
the join attribute from the record template, let 
X = The_number_of buckets_in_j.ashing_table 

Step 2: If MAX-MIN < X 

then go to step 4 

else Tempi = (MAX - MIN) Div X 

Step 3: Get the input record and let 

y = The_value_of_the_join_at tribute 
bucket_number = (X - MIN) Div Tempi 
go to step 5 

Step 4: Get the input record and let 

Y = The_value_of_the_ join_at tri bute 
bucket_nu mber = Y - MIN 

Step 5: Heturn the bucket number to the calling procedure. 

(2) T he Has hi ng Algorithm tor the 

Character - Valued A 1 1 rib u tes. The record template does not 
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The record template does not provide the maximum and the 
minimum values for the character- valued attributes as it 
does for integer- valued attributes. In order to minimize 
collisions and distribute records evenly into buckets, we 
design a lookup table, which is an array with 2048 
character-string elements, to perform the hashing function. 
The number of the elements is egual to the number of the 
entries in the bucket index of the hashing table. The 
values of the join attributes of the input records are 
searched against the contents of the lookup table to obtain 
the bucket values. The binary search algorithm is used to 
minimize the searching time of the lookup table. 

The contents of the entries of the lookup 
table are created in the following way; 

(1) Get a English dictionary with more than 2048 pages; 

(2) Divide the pace number by the number of the buckets 
(in our case the number is 2048) ; 

(3) Let the result be x. y, where the x and y are positive 
decimal digits; 

(4) Pick up the last word of every x.y page from the 
dictionary and place the first four characters as an 
entry in the lookup table; and 

(5) If the length of the selected word is less than 4, 
fill the word with trailing blanks. 

We use only the first four characters to compare the values 
cf join attributes for two reasons. First, we believe that 
there are very few English words that will have the same 
first four letters. Second, we want to reduce the 

primary-memory reguirements for the lookup table. 

The algorithm for the character-valued 
attributes is as follows. 

Step 1: Let HIN = 0 and MAX = 2047. 

Step 2: Get the input record and let 

X = The_ value_of _the_join_attribute; 

Step 3; If X > look_up_table[ MAX ] 
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then 

bucket_number = MAX, go to step 6. 

Step 4: Dse binary search to find the bucket number. 

Step 5: Return the tucket number to the calling procedure. 

3 . The Bucket-Blcck Trackin g P roced u re 

The input to this procedure may be either the local 
records (either the source records or the target records) 
with their bucket numbers from the hashing procedure or the 
non-local target records grouped by their bucket values from 
the other backends. The outputs from the procedure are the 
logical addresses of the hashing tables of the source 
request and the target request, which are sent to the 
merging procedure for the merging operation. The 

bucket-block tracking procedure performs three functions: 

(1) maintaining a global table to keep track of the 
logical addresses of the hashing tables for all 
retrieve-common requests which are currently being 
processed in the system; 

(2) maintaining a hashing table for the current request 
and keeps track of all of the buckets and blocks of 
that hashing table; and 

(3) storing the input records into appropriate buckets and 
blocks according to their bucket values. 

In order to provide a better understanding of this 
procedure, we first introduce the structures of the blocks, 
the buckets, the hashing table and the global table. We then 
discuss how these functions are accomplished. 

a. The Structure of a Block 

Each block is divided into two parts: the header 
and the body. The he ader has two fields. The first field is 
used to record the length (in bytes) of the body, i.e., all 
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of the records in bytes currently stored in this block. The 
second field is used to store the logical address of the 
next block whose records have the same bucket value as this 
block. If there is no other block of the bucket, then there 
is a null address in this field. The b ody is used to store 
the hashed records and their common attribute values. 
Blocks which are in the same bucket are maintained as an 
inverted list and tracked by their logical addresses. The 
structures of the block and its header are shown in Figure 
4. 3. 
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B. The Structure of Block Header 
Figure 4.3 The structures of Block and Its Header. 



t. The Structure of a Bucket 

As mentioned in chapter II, instead of using 
primary and overflow areas, each bucket uses fixed-size 
blocks to store records. The number of blocks per bucket 
may vary among different buckets. The buc k et en tr y is used 
to indicate the status and to keep track of the blocks of 
that tucket. 
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Each bucket entry in the bucket index has two 
parts; the status and the logical address of the block 
currently being used. The status is used to indicate 
whether or not the bucket is empty. The size of the bucket 
entry is 8 bytes, where 2 bytes are used for the status and 
6 bytes are used for the logical address which is 
represented by a tuple consisting of the logical disk 
number, the logical cylinder number and the logical track 
number. The structure of a bucket is shown in Figure 4.h. 
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Figure 4.4 The Structure of a Bucket-entry. 



c. The Structure of the Hashing Table 

A hashing table is an array of bucket entries. 
We anticipate that the retrieve-common operation will be 
implemented on a SUN Sorkstaticn running the UNIX operating 
system, with a 16K unit of disk I/O. Using the equation 
from the previous subsection, we can compute the number of 
bucket entries for our hashing table to be 2048. 

d. The Global Table 

Since HBDS allows concurrent processing during 
the retrieval operation, there may be several 
retrieve-common requests in the system. We need a table 
that keeps track of all of the logical addresses of the 
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hashing tables for each retrieve-common reguest. Each entry 
of the global table contains two parts: the request id of 
the reguest and the logical address of the hashing table for 
that request. The request id consists of the traffic id, 
which is the unique identifier of a traffic unit [ Bef . 11 : 
p. ^1]» and the request number which indicates the sequence 
of the request in the traffic unit. Each entry of the 
global table is created whenever a new hashing table is 
created, and deleted when that request has been completed 
processing. The structure of the global table is shown in 
Figure 4.5. 
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Figure 4.5 The Structure of the Global Table. 



e. The Sequence of the Operations of the 
Bucket-block Tracking Procedure 

The steps of the sequence to accomplish the 
operations of this procedure are described as follows. 

Step 1: Create and initialize the global table. 
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step 



Step 

Step 

Step 

Step 

Step 

Step 

Step 



Step 



Step 



2: Check the reguest ID of the input records with the 
global table to see if the input records belong to 
a new reguest. If they do, then allocate a hashing 
table for that reguest, initialize the bucket 
index and store the logical address of the hashing 
table into the global table. Otherwise, get the 
existing hashing table into the primary memory 
using the logical address information provided by 
the global table. 

3: Extract a record from the input buffer. If the 
record is the first record of that request, then 
go to step 10. 

4: If the bucket value of this record is the same as 
the previous one, then go to step 8. 

5: Store the block which contains the previous record 
back to the secondary storage. 

6: Get the desired bucket entry (table entry) for the 
record by its hashed bucket-value. Check the 





status of the bucket. 


If it is 


"empty" , 


then go 




to step 1 1 . 










7: 


Get the currently 


used block 


by 


its 


logical 




address in the bucket 


entry. 








8: 


If there is space in 


the block 


that 


is 


available 



for storing this record, then go to step 12. 

9; Get a new block, put the current logical address 
of the bucket entry into the "logical address of 
next block" field of the block header. Then, 
update the bucket entry with the logical address 
of this new block. Goto step 12. 

10:Get the desired bucket entry by its hashed 
bucket-value, update the status of that bucket 
entry to "ret empty". 

11:Get a new block and put its logical address into 
the bucket entry. 
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step 12;Store the record into the block and update the 
"length of record" field of the block header. 

Step 13:Eepeat the steps 3 to 12 until all records have 
teen processed. 

Notice that the block is not ijimediately 
returned to the secondary storage after the insertion of one 
input record. Since the records in MBDS are stored by 
clusters, it is very likely that records within the same 
cluster will be retrieved again. Therefore, by keeping the 
current block in the primary memory, we may save one store 
and one read operations if the next input record is 
retrieved from the same cluster and hashed into the same 
bucket (that is, they may have the same bucket value). 

^ P rocedur e 

This procedure is used to perform the merging 
operation. The inputs to this procedure are the logical 
addresses of the hashing tables of the source request and 
the target request, which come from the bucket-block 
tracking procedure. The outputs from this procedure are the 
merged results, which are sent to the controller. 

The algorithm of the merging procedure is as 

follows. 

Step 1 : Reserve a result buffer. 

Step 2: Get the hashing tables of the source request and 
the target request by their logical addresses. 

Step 3: Compare the bucket statuses of these two hashing 
tables bucket by bucket. If both buckets contain 
records for a particular bucket number, then 
retrieve all the records associated with this 
particular tucket value from both tables. 

Step 4: Apply the straightforward merging algorithm on 
those retrieved records. Insert merged results 
into the result buffer. 



59 



step 5; If the result buffer is full, then send its 
contents to the controller. 

Step 6; Repeat steps 3, 4 and 5 until all the buckets have 
been processed. 

Step 7; Free the result buffer. 

B. TEE OPERATIONS OF THE FOOE PHASES 

In this section we discuss the operations of each phase 
of the retrieve-common request and the software which will 
be affected by those operations. 

1. The Req uest- rre proc essi ng Phase 

a. The Operations 

The operations of this phase include parsing the 
user’s transaction (or request) and if the transaction 
(request) is correctly parsed, then the controller will 
compose an appropriate message to inform the backends to 
begin execution for the request. Since the retrieve-common 
request is conceptualized and executed as two retrieval 
operations, the parser has to parse the user's request and 
transform the request from the form of a single request to a 
form cf a transaction with two requests. 

b. The Affected Software 

Basically, operations of this phase can be done 
by the existing Request Preparation process. However, the 
software for this process must be modified as follows: 

(1) The parser should be able to recognize the newly added 
syntax and correctly parse the request; 

(2) The composer should be able to form a new message to 
inform PP and all of the backends so that they can 
perform the desired operation; 
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(3) New message tjpes are added for processing the 
retrieve-common request; and 

(4) PP and all of the backends should be able to recognize 

and process the new created message for the 

retrieve-common request. 

2 . The Record -r e trievin g Phase 
a. The Operations 

Operations of this phase include the address 
generation and the record retrieval for both the source 
request and the target request. These two requests will be 
processed by DM as the other four different types of 
requests. As mentioned in previous chapter, the target 
records are processed after the source records. In order to 
separate the records of these two requests, DM will first 
send the source reguest and its associated address set to 
EECP, and hold the target request and its addresses set 
until receiving a message from RECP indicating that ail 
source records have been retrieved. 

The record-retrieving operation is performed by 
the physical-da ta-operation sutprocess in RECP as a regular 
retrieve request. Instead of sending the retrieved records 
to the controller, control logic is used to route them to 
the hashing module for hashing and subsequent merging. 

t. The Affected Software 

Most of the operations of this phase are done by 
DM, CC and the Physical Data Operation of RECP in each 
backend. The affected software includes: 

(1) Re need to add control logic into DM so that the 
address information of the source and target request 
will not be sent to RECP together; and 
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(2) We need to add a new procedure to handle the 

retrieve-common reguest and control logic to route 

the results to the hashing module instead to PP. 

3 . The Hashing - a nd- storin g Phase 

This is the most important part of the 

retrieve-common reguest. All of the records are prepared in 

this phase, so they can be merged on next phase. The 
operations of the hashing-store phase includes: 

(1) performing hashing operations on the local records, 

(2) table maintenance and bucket-block tracking 
operations, and 

(3) broadcasting (and receiving) the target records and 
their bucket-values to (from) the other backends. 

a. The Hashing Operations 

This operation is performed by the hashing 
procedure of the hashing module. Upon receiving the local 
records from the previous phase, the hashing procedure will 
check the record template to get the value type of the 
common attribute values and then apply an appropriate 
hashing function to hash the common attribute values. The 
records and their hashed bucket-values will then be passed 
to the bucket-block tracking procedure for further 
processing. 

b. Table-maintenance and Bucket-block Tracking 

Operation 

This operation is done by the bucket-block 
tracking procedure. A global table is maintained to store 
the address of all of the hashing tables for all of the 
different retrieve-common reguests which are currently being 
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processed by the system. Whenever a new retrieve-common 
request is encountered, the bucket-block tracking procedure 
will create a new hashing table for that request. The 

logical address of the newly created hashing table is then 
stored into the global table. The hashing table will be 
deleted when the request is complete. Records are stored 
into buckets according to their hashed values. The 

information of the bucket entries and the block headers are 
maintained and updated by the bucket-block tracking 

procedure as described in the previous section. 

c. Broadcasting And Receiving Target Records 

Between Backends 

After the local target records has been hashed 
and processed, each backend will buffer its local target 
records (retrieved frcm the target- hashing table with their 
bucket values) and broadcast them to the other backends. 
Upon receiving those non-local target records, each backend 
will store them intc the target-hashing table by their 
bucket values. A checklist is used to ensure that the 
target information fxcm all of the other backends has been 
received . 



d. The Affected Software 

Since the operations of this phase are done by 
the hashing module; RICP is affected to the extent that this 
module is integrated into the EECP process. No ether 
existing software will be affected. 

4. The Merging P hase 

This is the last phase of the retrieve-common 
operation. The local source records and the entire set of 
target records are compared and merged. 
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a. The Operation 

The operations are performed by the merging 
procedure of the hashing module. Because the records of 
both tables are unscrted, they are merged by using the 
straightforward algorithm. The merged results are stored in 
a result buffer and then sent to the controller. 

b. The Affected Software 

Since this phase is also done by the hashing 
module; RECP is affected to the extent that this module is 
integrated into the EECP process. No other existing system 
software is affected. 
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V. THE IMPLEMENTATION 



In this chapter# we describe how the retrieve-common 
request is integrated into the USDS system. To successfully 
perform the integration# it is necessary to modify a portion 
of the M3DS software. Therefore# this chapter alsc on 
discussing how the MBDS software is modified for the 
integration and implementation of the retrieve-ccmmon 
oper ation. 

In the remainder of this chapter we first describe the 
modified processes of the controller. Second# we describe 
the modified processes of each backend. Then# we present 
the modified M3DS message-passing facilities. Finally# we 
trace the execution sequence of the retrieve-common request 
in terms of the types of messages that are passed among the 
MBDS processes. 

A. THE MODIFIED PROCESSES OF TOE CONTROLLER 

• Ihe Reques t Preparation P ro cess (R EQP ) 

There are twc subprocesses in REQP# namely the 
parser and the composer. The parser parses the requests and 
checks for syntax errors. The composer transforms the 
correctly parsed requests into the form required for 
processing at the backends. 

a. The Parser 

The parser does both the lexical and the 
syntactical analyses cf the ABDL transaction (or requests) . 
The input to the parser is either a request or a 

transaction. The outputs frcm the parser are the error 
messages to the test interface# the aggregation operators to 
P? and the correctly parsed requests to the composer. 

6 5 



The lexical analysis is done by the lexical 
analyzer produced by lEX [Ref. 11 : p. 42]. The input to 
LEX is a specification of the tokens of the language (i. e. , 
the tokens of ABDL) in the form of regular expressions an d a 
set of subroutines which specify the actions to be taken 
upon recognition of the tokens. The syntactical analyzer is 
generated by YACC (Yet Another Compiler Compiler) [Ref. 12]. 
The input to YACC is a specifica tion which includes the 
declarations of tokens’ names, the rewriting rules of the 
grammar, and the action program. YACC produces a C program 
to determine whether the input ABDL transactions (requests) 
are syntactically correct. 

For the parser to correctly parse the users' 
retrieve-common requests, we have made several modifications 
to the original parser subprocess. These modifications are 
listed below. 

(1) Regular expressions for the LEX. 

We have added a new set of regular expressions so 
that the lexical analyzer can recognize the 
retrieve-common request and generate appropriate 
tokens which in turn can be recognized and used by 
YACC. 

(2) Grammar rules for YACC. 

A new set of rules has been added into the original 
ABDL grammar so that the parser can recognize those 
tokens which are generated for retrieve-common request 
and organize those tokens by these newly created 
rules. 

(3) The request type. 

We have added a new request type, the retrieve-common 
request, so that the parsed transaction can be 
correctly identified and properly executed by the 
composer and the other processes of HBDS. 
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(U) Ih€ action program. 

The input of the retrieve-common reguest to the parser 
is in the form cf a single reguest. The parser should 
he able to parse this reguest and generate a 
transaction of two retrieval reguests (each of the 
retrieve-common reguest type) . If the join attribute 
is not in the target list (of the source or the target 
reguest) , the action program inserts the join 
attribute into the head cf the target list. The extra 
attribute- value pairs (i.e., the join attribute-value 
pairs) of the retrieved records, which are going to be 
deleted by the merging procedure, are not to be in the 
results so that the merged results contains only the 
desired attribute- value pairs. The newly added 
regular expressions, grammar rules and the SSL for 
the modified action program are provided in Appendix 
A. 



h. The Composer 

The composer receives the correctly parsed 
reguests from the parser and formats them into the reguired 
message format. Then, the composer broadcasts the formated 
messages to all of the backends for execution. We have 
modified the original composer program so that the composer 
can correctly reformat the retrieve-common reguest. 

2 . The ^st Processing Pro cess (PP) 

The post processing process includes the aggregate 
post operation and the reply monitor. The functions of PP 
are described in [Hef- 11 : p. 27]. The aggregation post 
operation is not modified. The only modification in the 
reply monitor is to recognize the new reguest type for the 
retrieve-common reguest. 
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B. THE MODIFICATIOH CF THE BACKEND PROCESSES 

As described in chapter II, one of the design issues of 
MBDS is to assign as nuch work as possible to the backends. 
Conseguently, there are more changes in the processes of 
each backend than changes in the controller. The affected 
processes are directory management and record processing. 

1 • The Dir ect or v Ma nagement Proce ss (DM) 

DM receives the new transaction message for the 
retrieve-common reguest from the request composer and then 
performs a number of directory operations, which includes 
attribute search, descriptor search, cluster search, address 
generation and directory table maintenance. From our 
earlier discussion, we know that the source and target 
request for a retrieve-common request should not be 
processed concurrently by RECP. The target request must be 
held in DM until RECP informs DM that the source request has 
finished execution. Therefore, DM will first process the 
source request and send the request and its addresses to 
RECP. The target request is held in DM until RECP notifies 
DM that the source request is done. 

At what stages of the DM processing do we hold the 
target request? There are several alternatives for holding 
the target request in DM. These alternatives are list below. 

(1) Hold the target request without performing any 
directory operation. 

(2) Hold the target request after it completes attribute 
search. 

(3) Hold the target request after it completes attribute 
search and descriptor search. 

(4) Hold the target request after it completes attribute 
search, descriptor search and cluster search. 
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(5) Hold the target reguest after it completes attribute 
search, descriptor search, cluster search, and address 
generation. 



Alternatives 2, 3, 4, and 5 will generate status and 
directory information for the target request which must be 
held somewhere. Due to the large number of the possible 
attributes, the size of the status and directory information 
may be too big to be kept in the primary memory, i.e., they 
will have to be stored back to the secondary storage. The 
extra disk I/O time for moving the status and directory 
information in and out of the primary memory, not only slows 
the retrieve-common operation, but also increases the 
program complexity and causes many unnecessary changes to 
the existing software. Therefore, we choose alternative (1) 
to process the target request. 

The algorithm for the modified DH is as follows. 

Step 1 : Get the next message from the message queue and 
find the sender of the message. 

Step 2: If the sender is the controller, then go to step 
5. 



Step 3; 
Step 4: 
Step 5: 

Step 6: 

Step 7: 
Step 8; 



If the sender is SECP, then go to step 8. 

If the sender is CC, then go to step 11. 

If this is not a retrieve- comm on transaction, then 
go to step 11. 

Identify and separate the source request and the 
target request from the transaction. Hold the 
target request and perform the directory 
processing on the source request. 

Send the source request with its address set to 
RECP. Go to step 1. 

If this is not the message which indicates the 
completion of retrieving all the source records, 
then go to step 11. 



step 9; Get the correspondent target request and perform 
directory processing on that target request. 

Step 10:Send the target request with its address set to 
RECP. 

Step 11: Perform the original DM operation. 

The SSL for the modified DM is provided in Appendix B. 

2- The Rec ord Proce ss ing Process (R3CP) 

RECP receives the requests and their address sets 
from DM and performs the physical data operations on those 
requests. The original ph ysical-data-operation subprocess 
includes a control function and a subfunction for each type 
of request. The suhfunctions are invoked by the control 
function according to the type of request being processed. 

In order to process the retrieve-common request, we 
have made two modifications to RECP: 

(1) adding a new subfunction, the retrieve-common 

sutfunction, into the physical-da ta-operation 

sufcprocess; and 

(2) adding a new subprocess, the hashing module, into 
RECP. 

a. The Retrieve-Common Subfunction 

The purpose of the retrieve-common subfunction 
is to direct the flow of the control in the 

physical-data-operat ion subprocess so that the 

retrieve-common request can be processed correctly. The 

difference between the retrieve-common subfunction and the 
retrieve subfunction can be summarize! as follows. 

(1) The retrieve subfunction sends the retrieved records 
to the PP , whereas the retrieve-common subfunction 
sends the retrieved records to the hashing module. 
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(2) In addition to sending a message to CC to indicate the 
completion of the retrieval of physical data (as the 
retrieve subfunction does) , the retrieve-ccmmon 
sutfunction will send a message to notify DM that all 
the source records have been processed. 



The algorithm for the retrieve-common 
subfunction is as follows. 

Step 1: Reserve a result buffer. 

Step 2: For each address in the set of tracks which are 
furnished by DM, fetch the track from the disk 
and place it in the track buffer in the primary 
memory . 

Step 3: Examine the records in the buffer one-by-one. If 
the record is marked for deletion, disregard it. 
If the record does not satisfy the query, 
disregard it. If a record satisfies the query, 
then extract the values for the attribute names in 
the target-list of the request and store this 
information in the result buffer. 

Step 4: When the result buffer is full, send the contents 
of the buffer to the hashing module. 

Step 5: Repeat steps 2, 3 and 4 until there are no more 

addresses for the request. 

Step 6: Send a message to CC to release the lock for this 
request. If this is a source request, then send a 
message to DM so that DM can process the target 
request. 

Step 7: Free the result buffer. 



The SSL for the modified control function and the 
retrieve-common subfunction are provided in Appendix C. 



b. The Hashing Module 



The hashing module performs the hashing and 
merge operations. Ihe merged results are sent to the 
controller. The module is invoked by the retrieve-common 
subfunction of the physical-data-operation subprocess. 
There are three procedures within this module/ the hashing 
procedure, the bucket-block tracking procedure and the 
merging procedure. 

(1) The Hashing Procedure. The hashing 
procedure receives the records from the retrieve-common 
subfuncticn of the physical-data-operation subprocess and 
performs the hashing function on the value of the join 
attribute of each record. The records and their hashed 
results are stored in a result buffer. When the buffer is 
full, its contents axe passed to the bucket-block tracking 
procedure for further processing. 

The algorithm for the hashing procedure is 

as follows. 

Step 1 : Beserve a result buffer. 

Step 2: Get the data type of the value of the join 
attribute from the record template and reserve a 
result buffer. 

Step 3: Extract a record from the input buffer which is 
passed from the retrieve-common subfunction. 

Step 4; Apply the appropriate hashing function to hash the 
value of the join attribute of the record 
according to data type. (See Chapter IV again.) 

Step 5: Store the record and the hashed bucket value in 
the result buffer. 

Step 6: If the result buffer is full, then send the 
contents of the result buffer to the bucket-block 
tracking procedure. 
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step 7: Repeat steps 3, 4, 5 and 6 until there are no more 
records in the input buffer. 

Step 8; Free the result buffer. 

The SSL for the hashing procedure is provided in Appendix D. 

(2) The Eucket-block Tracking Procedure. This 
procedure stores the records (both the source records and 
the target records) into blocks according to their bucket 
values and maintains one hashing table for the currently 
processed request and one global table to store the 
logical-hash-table addresses for all of the retrieve-common 
requests in system. The inputs to this procedure are the 
records and their hashed bucket values, which either come 
from the local hashing procedure or from the other backends. 
A checklist is used to ensure that the hashed results of the 
non-local target records are received from all of the other 
backends. There is also an additional disk I/O buffer used 
in this procedure to iiove the blocks of each bucket into and 
out of the primary memory. The outputs from this procedure 
are the logical addresses of the two hashing tables of the 
source request and the target request, which are passed to 
the merging procedure. The structures of the global table, 
hashing table, bucket, and block have been described in 
Chapter 17. After processing all of the local records, this 
procedure will group the local target records together with 
their bucket numbers, and then broadcast them to all of the 
other backends. 



The algorithm for this procedure is as 



follows. 

Step 1: Create the global table and reserve a disk I/O 
buffer . 

Step 2: Get an input buffer of records. If the input 
buffer contains source records, then go to step 5. 
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step 3: If the input buffer contains local target records, 
then go to step 6- 

Step 4: If the input buffer contains the target records 
received from the other backends, then go tc step 

8 . 



Step 5; Get the hashing table for the source request. Go 
to step 7. 

Step 6: Get the hashing table for the target request. 

Step 7: Store the record into a bucket and perforin the 
bucket-block tracking operation (as described in 
chapter IV). Go to step 9. 

Step 8: Perform the bucket-block tracking operations to 
insert these incoming records into the target 
hashing table. 

Step 9; Repeat steps 2 to 8 until all records have been 
processed. 

Step 10: If the input buffer contains local target 

records, then retrieve the local target records 
from the target hashing table bucket-by-bucket 
and broadcast them (with the bucket number) to 
the other backends. 

Step 11: If the input buffer contains non-local target 
records, then get the logical address of the 

hashing table of the source request. Pass the 
logical address of the hashing tables of the 

source request and the target request to the 

merging procedure for the merging operation. 

The SSL for this procedure is provided in Appendix E. 



(3) The Merging Procedure. This procedure does 
three functions: 

(1) fetching the hashing tables of the source request and 
the target request by their logical addresses which 
have been provided by the bucket-block tracking 
procedure; 
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(2) performing the merging operation on the records of 
both hashing tables (as described in chapter IV) ; and 

(3) sending the merged results to the controller. 

The merged results contains only the 
at tr ibute-value pairs whose attribute names are specified in 
the target-lists (either the source request or the target 
request). The extra attribute-value pairs (i.e., the join 
attributes and their vales, which have been added into the 
target lists by the parser) are deleted by this procedure. 
The SSL for the merging procedure is provided in Appendix E. 

C. THE HODIFIED HESSAGE- PASSING FACILITIES 

In Chapter II we have introduced the general format and 
the different types cf MBDS messages (see Figure 2.3 and 
Figure 2.4). In order to accomplish the retrieve-ccmmon 
request we have added two new message types which are shewn 
in Figure 5. 1. 

L. EXECUTION OF A RETRIEVE-COMMON REQUEST — VIEWED VIA 

MESSAGE-PASSING 

In this section we describe the sequence of actions for 
executing the retrieve-common request as it moves through 
MBDS. The sequence of actions are described in terms of the 
types of messages passed between the MBDS processes: EEQP, 

FP, DM, BSCP and CC- The order in which message are passed 
is denoted alphabetically {’a' is first). The digit 
following the ordering letter will be the message type as 
shown in Figures 2.4 and 5.1. 

The sequence of actions for a retrieve-common request is 
shown in Figure 5.2. First the retrieve-common request comes 
to REQP from the host (a1). EEQP sends two messages to PP : 
the number of requests in the transaction (b3) and the 
aggregate operator cf the request (c4) . The third message 
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Message Type 
Source 
Destination 
Explanation 



(3 2) Hashed Target Records 
Record Processing 

Record Processing (other backends) 

This message contains the bucket numbers 
of the target hashing table and all of 
the target records associated with 
their buckets. 



Message Type 
Source 
Destination 
Explanation 



(33) Source Retrieve Finished 
Record Processing 

Directory Management (same backend) 

This message is used to notify Directory 
Management that all of the source 
records have been retrieved. DM can then 
begin processing the target request. 



Figure 5.1 The New MBDS Message-Types. 

sent by EEQP is the parsed traffic unit which goes to CM in 
the backends (d6). CM sends the type-C attributes needed by 
the retrieve-common request to CC (e20) . Once an attribute 
is locked and descriptor search can be performed, CC signals 
DM (f26) . DM then process the source request (target request 
is now held) . DM performs descriptor search and signals CC 
to release the lock on that attribute (g23) . DM sends the 
descriptor ids for the request to the other backends (hi 5). 
The DM processes in the other backends send their descriptor 
ids to the DM process residing in this backend (i15). DM 
then uses its own descriptors and the descriptors received 
from the other backends to form descriptor-id groups. DM 
now sends the descriptor-id groups for the source request to 



76 











V2. The Controller 

> 




i i 1 I 

1 PP j j HEQP j 




< 


dt 


Uii 1 





Get Pci 



L 

Put Pci 1 



/ 


s. 




^ h'5‘, 




Oiz 










> 


c_ > 


c 



Put Pci 


1 ^ 


! 


Get Pci j ^ 


/ 


< 




























S9M 


1 1 








U (2 


cc 




Vi 




<-•5 










tv\>^ 






J 






tn iX 




f 






pifc 


-=^ 






1 




EECP 




— J 


DM 


1 


1 




^ 


1 






1 


i Backeni 





Figure 5.2 The Seguence of Messages for Executing a 

RetrieTe-comnon ReguesL. 



77 



CC (j21). Once the descriptor-id groups are locked and 
cluster search can te performed, CC signals DM (k27) . DM 
then performs cluster search and signals CC to release the 
locks on the descriptor-id groups (m25) . Next, DM sends the 
cluster ids for the retrieval to CC (n22) , Once the cluster 
ids are locked, and the request can proceed vith address 
generation and the rest of the source-request execution, CC 
signals DM (o28) , DM then performs address generation and 
sends the source request and the address set to RECP (p16) . 
Once the retrieval request has executed properly, RECP sends 
a message to DM to start processing the target request 
(r33) . DM processes the target request in the same way of 
processing the source request (i,e., phases e20 to p16) , 
Ihe retrieved records ate processed by the hashing module in 
RECP. Once the local target records have been processed 
properly, the hashing module broadcasts the hashed target 
records (grouped by tucket numbers) to the other backends 
via RECP (s34) . The hashing modules in the other tackends 
sends their hashed target records to the hashing module of 
this backend (t34) . Once the comparing and merging 
operations performed by the hashing module, the results are 
sent to PP (u2) . PP then forwards the results to the host 
(v2) , 
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VI. CONCLUSION 



A. EEVIEH AND SOHHABl 

The irulti-backend database system (USDS) in the 
Laboratory for Database System Research at the Naval 
Postgraduate School is designed to overcome the 
performance-gain and capacity-growth problems of either the 
traditional database system or the 

single-backend-software-database system. The original MBDS 
supported four primary operations, namely, RETRIEVE, DllEIE, 
UPDATE and INSERT. This thesis presented the design and 
implementation of the fifth primary operation, the 

RETRIEVE-COMNON operation. The retrieve-common operation is 
used to merge two files by common attributes. Our major 
goal is to maximize the utilization and minimize the 
affects to the existing system. 

Re have analyzed several possible design alternatives 
and then selected the best one for our design and 
implementation approach. The key issues for the selections 
are the cohesion to the design requirements, the design 
issues of MBDS and the time complexities of implementation. 
Cur design and implementation is based on the bucket-hashing 
approach. Each backend performs partial merge with its 

portion of source records and the entire set of target 

records, sending its results to the controller. The 
controller forwards the final results to the user at the 
host computer. 

Based on the selected design and implementation 
approaches, the operations of the retrieve-common request 
are executed in four phases, the request-preprocessing 
phase, the record- retrieving phase, the hashing-and-storing 
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phase and the merging phase. The retrieve-common requests 
is first parsed to be a transaction of two retrieval 
requests {each of the retrieve-common type request) by the 
parser- Then, the parsed requests are reformated into 
required message formats and broadcasted to all the tackends 
by the composer of the controller. Zach backend receives 
the formated messages of the transaction, separates the 
source request and the target request and then performs the 
directory operations and retrieves the records according to 
the queries specified in the requests. The retrieved 

records of the source record set and the records of the 
target record set are separately hashed on their common 
attribute values and then stored into buckets of the source 
hashing table and the target hashing table, respectively. 
The hashed records of the source buckets and the records of 
the target tuckets are compared and merged bucket-by-bucket. 
The merged results are sent to the controller from all of 
the backends. The controller then forwards the results to 
the host computer. In order to accomplish the operations of 
the retrieve-common request, we have designed a hashing 
module into the record-processing process of each backend. 

For integrating cur design into MBDS, we have made 
several modifications. These are; 

(1) the message-passing facilities, 

(2) the parser of the reguest-preparation process of the 
controller, and 

(3) the directory-management process and the 

record-processing process of each backend. 

The algorithms for the modifications and the program 
specifications {SSL) are also provided in Character IV, V 
and Appendices. 
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B. FOTOEE WORK 

Ihe next step in the design and implementation of the 
retrieve-common operation is the modification of the I1BDS 
software according to the SSL given in the appendices. There 
are two classes cf modifications. First, existing software 
is updated to reflect the changes necessary for the 
retrieve-common operation. In the system, new message types 
must te defined, the request-prepatation and post-processing 
processes of the controller are changed, and the 
directory-management process is changed to correctly 
sequence and execute the retrieve-common request. Second, 
new software is written to handle the processing of the 
retrieve-common request, i.e., the hashing module. In the 
system, the software for the hashing module is coded tested, 
and integrated into the record-processing process of each 
tackerd. 
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APPENDIX A 

THE MODIFIED REQDESI PREPARATION PROGRAM SPECIFICATIONS 



In this appendix, we present only the modified portions 
of the Request Preparation process. The original SSI is in 
£Ref. 11 : p.87]. 

A. THE lEX MODIFICATIONS 

♦ 

* We have added the regular expression for the token 

* COMMON into LEX. The rest of LEX remains unchanged. 

* The original specification is in the Isrc file. 

. (The original Iscr specifications.) 

EY £ 

return (TOKBY) ; 

} 

CCMMON £ 

return (TOKCOM) ; 

} 

”< = » £ 

return (LS) ; 

} 

. (The original Iscr specifications.) 
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B. THE laCC eODIFICMIONS 



In this section, we present only the SSL for the 
modified portion of the parser. The original program is in 
the ysource file, 
procedure yyparse {) ; 

* This procedure is used to parse the output of LEX. * 

* The modificaticn of the yyparse procedure converts * 

* the retrieve-ccmmon reguest from a single request * 

* into a transaction of two requests. * 

* ♦ 

* Data structures and variables used in this * 

* procedure: * 

* 1. No new data structures are introduced by this * 

* modificaticn. ♦ 

* 2. com_flag_1, com_flag_2, com_flag_3, com_flag: * 

* Boolean variables which are used indicate the * 

* different conditions of the retrieve_common * 

* request. * 

* 3. new_tbl_ptr: ♦ 

* a pointer to a request table. * 

* The request table is defined in the commdata .def * 

* file as a EEQt bl_definition structure. * 

* 4. com_atrb_1, com_atrb_2: * 

* Character strings to hold the common attribute. * 

*#*:![♦**♦**♦♦♦♦♦♦#*♦♦*♦*♦♦♦*♦♦♦♦♦«***** ***^**** ****+*♦♦*/ 

/♦ The following is the modified portion of yysource. V 

/* add a new token in the specif ica tion. */ 

Ttcken [str] TOKCCM /* common */ 

/♦ add new derivations and program specif ications. */ 

transaction : beg_tran lines 

/* No changes in this part */ 
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end_reg 



r eg_forms 



ccnuncn 



attribute 



/* cf the transaction rule. */ 

I beg_single_reg line 
if com_flag 
then 

/* This is a retrieve-common 
reguest. */ 

Perform the operations which are 
specified under the beg_tran 
lines ; 

else 

/* Perform original operations. */ 
end if; 

EOR 

/* Clear the com_flags. */ 
com_flag = false; 
com_flag_3 = false; 

delete query 

• • • 

.../* These are the 

original derivations. ♦/ 

• • • 

reg^forms common target_list reg_forms; 
TOKCCM 

perform CHECK_EEQUSST_TIPE (reg_tbl,GK) ; 

/* Check if the first request is 
a retrieve. ♦/ 

if CK 
then 

com_flag = com_flag_l = true; 
else 

perform EEROE_PROCEDUEE; 
end if; 

LETTEEFIRST 
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if com_flag_1 
then 

/* This attribute is the common 
attribute of the source 
request. Copy the attribute 
into coni_atrb_l. */ 
perform strcpy (com_atrb_1 , 
attribute) ; 

/* Put the common attribute of 
the source request into 
the target list and 
convert the request table from 
the form of single request tc 
the form of a transaction. */ 
perform CONV ERT (tbl_ptr->reg_tbl , 

com_atrb_1 , 
traf_id, req_cnt, 
new_t bl_ptr->req_tbl) ; 
com_flag_2 = true; 
com_flag_1 = false; 

/* com_flag = true */ 
else 

if com_flag_2 
then 

/* This attribute is the 
common attribute of the 
target request. */ 
com_atrb_2 = strcpy (at tribute) 
com_flag_3 = true; 
com_flag_2 = false; 
else 

if com_flag_3 = true; 
then 

/* This is the first 
attribute of the target 
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retrieve 



delete 



insert 



list of the target 
request. */ 

insert com_atrb_2 into the 
target request table; 
insert the attribute into 
the target request table; 
end if; 

/♦ Perform the original 
operations. */ 

end if; 

en d ; 

: TOKEEIRIEVE 
if ccm_flag_3 
then 

perform EBEOR_PBOCEDURE; 
else 

if com_flag 
then 

/♦ Change the type to be 
RETEIEVE_COaMON. */ 

end if ; 
end if; 

/* Perform the original operations. */ 

; TOKDEIETE 
if com_flag 
then 

perform EEEOE_PROCEDDRE 0 ; 
else 

/♦ Perform the original operations. */ 
end if; 

; TOKINSERT 
if cctt_flag 
then 
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perform EEROE_PROCEDURE () ; 
else 

/* Perform the original operations, */ 
end if; 

update : TOKUPIATE 

if cci_flag 
then 

perform EREOR_PROCEDURE {) ; 
else 

/* Perform the original operations. ♦/ 
end if; 

/* Perform the original operations. */ 
end procedure yyparse; 



procedure CONVERT (input : source_reg_table, source_com_atr, 

traf_id, reguest_number , 
index_req_ptr ; 

output: target_reg_table , regues t_number, 
index_reg_ptr) ; 

* This procedure is used to rearrange the contents * 

of the request table of a request which is the * 

source retrieve of a RETEIE7E_COKHON request. ^ 

This procedure performs the following tasks; ^ 

1. Rearrange the source request table. ^ 

2. liake the common attribute of the source request* 



* 

♦ 

* 

* 

* 



the first attribute of the target list. 

3. Create a request table for the target request 
and return it to the calling procedure. 

Data structures and variables used in this 
procedure are: 

1 . source_req_t abler tar get _req_ table: 

The request tables of the source request and 



♦ 

* 

* 

❖ 

♦ 
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* the target request. * 

* 2. new_table; * 

* An array of Reqt bl_def inition structures. * 

* 3. traf_id: * 

* A character string which is the traffic id of * 

* a transaction. * 

* 4. reguest_Dumber; * 

* An integer which is used to indicate the * 

* number of requests in a traffic unit. * 

* 5. index_reg_ptr: * 

* A pointer to a parsed traffic unit, which is * 

* an array of Reqtbl_def inition structures. * 

* 6. source_ccni_atr: * 

* A character string which is the common * 

* attribute of the source request. ♦ 

/* Use a new request table, new_table to hold the 
contents of tie source_req_table» */ 
new_table20] = ECR; 

new_table[ 1 ] = str_to_num (traf_id) ; 
new_table[ 2 ] = request_number ; 

new_table£3] = routtype; /♦ Defined in yyparse () . ♦/ 
new_table[4] = RITEIEVE_COMHON ; 

/* Copy the contents of the source request table into 
the new_table. */ 
i = 5; 
repeat 

new_table[i] = source_req_table[ i ] ; 
i = i+1 ; 

until source_req_table[ i ] = EOQ; 

/♦ Insert the common attribute into the new_table. */ 
new_table£i] = scurce_com_a tr ; 
i = i+ 1 ; 

/* Copy the rest of the source_req_tdble into 
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the new_table. */ 
repeat 

new_table£i ] = source_req_table[ i- 1 ]; 
i = i+1 ; 

until source_reg_table[ i- 1 ] = null; 

/* Put an end-o f-request marker, EOR, 
into the new_table. ♦/ 
new_table[i] = EOE; 

/♦ Copy the new_tabie into the source_req_table. */ 

i = 0; 

repeat 

source_reg_table[ i ] = new_table[ i ] ; 
i = i + 1 ; 

until source_re g_table[ i ] = EOR; 

/* Increase the reguest number, and create a request 
table for the target reguest- */ 
regu€st_number = reguest_number + 1 ; 
perform ALLOCATE_REQ_TABLE {target_reg_ table) ; 

/* Put the target_reg_table into the 
parsed traffic unit. */ 
index_reg_ptr->reg_tbl[ reguest_number- 1 ] 

= target_reg_table; 

/=♦ Return the reguest number, target_reg_table and 
index_reg_ptr to the calling procedure. */ 
end procedure CONVERT; 
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procedure CHECK_REQUEST_TYPE (input: reg_tbl; output: ok); 

y 3^ :«( :9 c *35e35c*:9c*39e:9c:9c39t:9c35e:fr35c:9t3{e35c39e:9c39c3!e:9esjc39c35t39c:9c39c*39c39c 

* This procedure is used to check the syntax of a * 

* retrieve_ccmmon request. If the request type is * 

* not retrieve, set OK to false. Otherwise, set OK * 

* to true. Return OK to the calling procedure. * 

## :9c :9c 39c 39c :9c :9c 39c :9c 39c :9c :9c:9c:9c:9c:9c:^:9c:9c39e:9c:9c39e:9c39c:9c:9c:9c:9c^:9e:9c39e39c39e:9c39e39c:9c39c39c:9c39e39c4c:^:^:^:9t:9c :^:9c y 

end procedure CHECK_REQUEST_TYPE ; 
procedure ERROR_PF.OC EEURE () ; 

4 c :)(#:(( 4 c 4c4c 4c 4c 4c 4c 4c 4= 4c 4c 4( 4c4c4c4c4c4c4c4c4c4c4c4c4c4c4c4c 4c4<4c 4c4c 

* This procedure is used whenever there is a syntax * 

* error in the request. * 

* This procedure will print an error message and * 

* terminate the parser operations. * 

^:9c:9c:9c39c39c39c39e:9c39c39e:9c39c:9c:9c:9c:9c39c39c :9c4e39e :9c :9c39c:9c:9c:9c:9c39c:9c39e39e39:39c39c39c^:9c39c:9c:9c39c4c39c39c3^:9:4c39c:9c:9c4::9c:9c 

end procedure EEROR_EEOCEDORE; 
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APPMDIX b 

THE MODIFIED DIRECTORY MANAGEMENT PROGRAM SPECIFICATIONS 



The original SSL for the Directory Management process is 
in [Ref. 13 ; p. 82-102]. In this appendix, we present only 
those procedures which are affected by the retrieve-ccmmon 
reguest. 



procedure DM_ParesedTrafDnit {) ; 

* This procedure is used when Reguest Preparation * 

* (EEQP) sends a traffic unit to Directory * 

* Management (DK). The original procedure is in * 

* the tu. c file. ♦ 

* We add an if statement to differentiate between ♦ 

* the retrieve-ccmmon reguest type and the other * 

* reguest types. * 

* No new variables are introduced in this procedure. * 

/♦ Get a pointer to the parsed traffic unit. */ 
ti_ptr = DM_R$ParsedTraf Unit {) ; 

/* Get a pointer to the record template 
of this traffic unit. ♦/ 
tiDpl_ptr = get_t mpl_ptr ( ti_ptr-> ti_dbid) ; 

/* Get a pointer to the attribute table. */ 

AT = AT_lookuptbl (ti_ptr->ti_dbid) ; 

Get the type-c attributes for the traffic unit 
and send them to DS_CC. ♦/ 
perform DM_TypeC_Attrs_Traf Unit {) ; 

/* Process the requests of this traffic unit. */ 



ri_ptr = ti_ptr -> ti_f irst_reg_pointer ; 

/♦ Get the type cf the first request of 
this traffic urit. */ 
if reg_type = RETEISVE_COMHON 
then 

/* !7e will only process the source request. */ 

/* The target request will not be processed */ 

/* until the record-processing process has ♦/ 

/* retrieved all of the source records. ♦/ 

/* Perform the descriptor search processing. ♦/ 
done = NINS_SR_DESC (Srie, ri_ptr, tmpl_ptr, AT) ; 
if done 
then 

/* Broadcast the descriptor ids to the 
other backends. */ 

DM_Broadcast_DIDs (Brid) ; 
end if; 
else 

/* This is net a retrieve-common transaction, so 
process the requests of the traffic unit 
one- by- one. */ 

end if; 

end procedure DM_Par esedTraf Unit; 



procedure DM_RecP_Msg () 



♦ This procedure is used when there is a message * 

♦ for DM from EECP (in the same backend). * 

♦ ♦ 

♦ He add a new message type to indicate that all * 

♦ cf the source records have been retrieved. * 

♦ ♦ 

* No new data structures or variables are used. ♦ 

* The original procedure is called by * 
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♦ DK_THI S_BE_H SG {) and is in the dirman.c file. * 

/* Get the message type. */ 

MsgType = DM_R$Type; 
switch (MsgType) 
case OldNewValue; 

perform DM_OldNewValues () ; 
case UpdFinished: 

perform D }!_UpdFinished () ; 
case Source_f irished: 

/* This is the message which indicates the 
completion of the retrieval of all the 
source records. */ 
perform D K_Source_f inished (msg) ; 
end switch; 

end procedure DM_RecF_Msg; 



procedure DM_Source_finished (input; message); 

* This procedure is used when DM receives a messages, * 



* from RECP, which indicates the completion of the * 

* retrieval of all of the source records. DM is now * 

* ready to process the target reguest. * 

* This procedure is called by DM_Recp_msg () . * 



3?c:?c:jc3!c:je3jc:?c^:5s:{e3!c:5c:^:je:je:?c:?c:flc^ :?c:?c :jc:?c4c:?e:jc:4e5}c:?c:5c:^c:5c:?c:5c:flc:?c3?c:?c:?e*:?c:?c^:?c:5t:<t*:^c y 

/♦ Receive the request id from the message. */ 
perform DM_R$Eid (source_req_id) ; 

/♦ Get a pointer to the traf_info entry by the 
source_reg_id. V 

ti_ptr = DH_TiFind (source_reg_id) ; 

/* Get a pointer to the reg_info entry for the source 
request. */ 



93 



source_re^_inf o_ptr = DH_EiFind (req_id, ti_ptr) ; 

/♦ Get a pointer to the reg_info entry for the target 
request by the source_req_in£o_ptr. */ 
tar get_ri_ptr = scurce_req_inf o_ptr->next_reg_inf o ; 

/* Get the request id of the target request. ♦/ 
tar get_req_id = Find_reguest_id (target_ri_ptr) ; 

/♦ Perform the directory operations on the 
target request.*/ 

/* Get the record template for the target request.*/ 
tmpl_ptr = get_t nipl_ptr ( ti_ptr-> ti_tbid) ; 

/* Get a pointer to the attribute table. */ 

Al = AT_look up tbl (ti_ptr-> ti_dbid) ; 

/* Perform the descriptor search processing. */ 
dene = NINS_SR_DISC {Grid, ri_ptr, tmpt_ptr, AT); 
if dene 
then 

/* Broadcast the descriptor ids to the other 
backends. */ 

perform DH_Broadcast_DIDs (Grid) ; 

end ; 

end procedure DH_Source_f inished; 
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APPENDIX C 



THE HCDIFIED EECOEE PEOCESSING PHOGEAM SPECIFICATIONS 

In this part of the appendix, we have added the 
retrieve-common subfunction into the control function of the 
physical-data-operaticn subprocess of the record-processing 
process (SECP) . We have presented only the modified portion 
of the original RECP in this appendix. 



procedure EegProcessing (input; MsgType) ; 

♦ 

* This procedure is used to process requests according 

* to the request type. 

* We add the retrieve-common request type into the 

* 

* switch statements as one of the optional cases. 

* 

* This procedure is called by the procedure RP_DH. The 

* original procedure is in the reproc.c file. 

**********^t********* **♦***♦************♦**<!** **♦*♦****♦/ 

/* Get the request type. */ 
switch (request_type) 

RETRIEVE_COHaON: 

perform ST_EetDel{); 

/* From this point, we ues the same 
procedures as used for the 
RETRIEVE request processing. */ 

/♦ Now, back to the original ReqProcessing {) . */ 
end procedure ReqProcessing; 
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procedure £P_ReadCompleted () ; 

* This procedure is used when a physical read is 

♦ ccmpleted. We add the retrieve-common request 

* type into its switch statements as one of the 

♦ 

♦ the request types cases. 

♦ 

♦ This procedure is called by the procedure RP_EP. 

* The original procedure is in the recproc.c file. 

4c:9e :9e:ec:^ :fc :(c * a{c :Jc :{c ^ 

/* Get the request type of this request. */ 
switch (request_type) 

EETRIEVE_COMMON : 

perform 5C_Ret(); 

RETRIEVE: 

perform EC_Eet(); 

/* Now, back to the original processing. */ 
end switch; 

end procedure RP_ReadCompleted; 

procedure RBISEND_C0?1P1ETI0N (input: RB_ptr, regtype) ; 



* This procedure does the following tasks: * 

* 1. Send the contents of the result buffer to * 

* either the hashing module or the controller, * 

* depending on the request type. * 

* 2. If this is a source request of a retrieve- * 

* common request, then send a message to DM * 

* indicating that all of the source records * 

* have been retrieved. ♦ 

* 3. Send a message to CC to release the locks on * 

* the database for this request. * 

* 4. Free the result buffer space after the * 



* contents of the result buffer have beer, sent.* 

♦ * 
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* All of the data structures ans variables are the * 

* same as the original procedure. * 

* Ihis procedure is called by the procedure * 

* EC_Ret 0 . ♦ 

* The original procedure is in the recproc.c file. * 

/♦ Get the request id by the result buffer pointer 
EB_ptr. */ 

reguest_id = RB_ptr->RB_rid ; 
if regtype = RET EIEVE_COMflCN 
then 

if the result_buff er is full 
then 

/* Send the contents of the result buffer */ 
/* to the hashing module and reinitialize */ 
/* the buffer size to 0. */ 

EASH_FDKC (reguest_id, result, result_length) ; 
result_length = 0; 
end if ; 

if this is the last result buffer 
for this request 
then 

/* Send the result buffer to the 
hashing module. */ 

perforin HASH_F0NC (reguest_id, result, 

result_length) ; 
if this is a source request 
then 

/* Send a message to DM indicating */ 
/* that all of the source records */ 
/* have been retrieved- */ 

perform DM_FinEeg$RP_S (reguest_id) ; 
end if; 

/* Free the result buffer space. */ 
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perform Eecp_£r€e (reguest_id) ; 

/♦ Send a message to CC to */ 

/♦ release the locks for this */ 

/* reguest. */ 

perform CC_FinReg$RP_S (reguest_id) ; 
end if; 
else 

/* This reguest is not a retrieve-common 
reguest. 

Now, back to the original processing. */ 

end if; 

end procedure RB$SENE_COMPLETICN; 

procedure XTRACT (input: TRACK_EUFFER, indexB, result2, 

reguest, tmpl_ptr, target_ptr; 
output: result2) ; 

* This procedure extracts the attribute names and * 

♦ values which correspondend tc the target list ♦ 



* cf a record. ♦ 

♦ This procedure is called by the procedure * 

* $RETR_PEOCESSING 0 . * 

♦ The original procedure is in the rbabs.c file. * 

♦ We add an end-of-recor d marker, EOR, at the end * 

* of every record. * 



/♦ Process all statements of the original procedure 
until the end of the outermost while loop. */ 

Add the following processing. */ 
if the regtype = RETRIE VE_COHMON 
then 

put the EORecord marker into the result buffer; 
end if; 

/* Now, back to the original processing. */ 
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end procedure XTRACT; 



procedure RB$POT_SEND (input : R ESOLT_BaFFER , result, 

length_of_result ) ; 



* This procedure puts the results for a request * 

* into the result buffer. If the result buffer is * 

* full, then the contents of the buffer are sent to * 

* the controller or the hashing module and the * 

* length of the buffer is set to 0. * 

* This procedure is called by the procedure * 

* RETR_PROCESSING 0 . * 

* The original procedure is in the rbabs.c file. * 






if the result buffer is full 
then 

/* Find the request type in the result buffer.*/ 
regtype = FIND_req_type (result_buf f er) ; 
if regtype = RETRIEVE_COMMON 
then 

/* Send the results to hashing module. */ 
perform HASH_F0NC (result_buf fer) ; 
else 

/* Send the results to the controller. */ 
perform RES$CNTL$RP_S (request_id , results, 

length_of_result) ; 

end if; 

length_of_r esult = 0; 
else 

/* Store the results into the result buffer. */ 
/* Now, back to the original processing. */ 
end if; 

end procedure R3$P0T_SEND; 
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procedure EP_CNL_ANOTEEE_BE_MSG () ; 

/*****♦*♦*♦**♦♦♦* ****** ******************************* 



* The purpose of this procedure is to process * 

* the messages received from the controller or * 

* the other backends. * 

* This procedure is modified for processing the * 

* the hashed information of the non-local target ♦ 

* records. * 

* The original procedure is in the reproc. c file. * 



********* ****** ♦♦*****♦♦♦♦********♦♦*****♦♦*******♦* / 

/♦ Get the message type. */ 
perform MsgType = Type$EP_E; 
case MsgType of 

Bucket_info ; 

/* This message is the hashed information */ 
/* for the non-local target records. */ 

perform PEOCESS_BE_TAEGET () ; 

/♦ This procedure should return the sender,*/ 



/♦ the reguest_id of the target request */ 

/* and whether or not this is the last ♦/ 

/* message from this backend. ♦/ 

/* Check to see if all the target records */ 

/* of all the other backends have been */ 

/♦ received. */ 

if LAST_HSG 
then 

perform CHECK_EECEIVE_MSG (sender. 



reguest_id, ALL_RECEIVED) ; 

end if ; 

if ALL_EECEIVED 
then 
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perform STA ET_TO_MEEGE {regues t_id) ; 

/* The called routine will perform */ 
/* the merging operation and send the */ 
/* results to the controller. */ 

end if; 



/* Now, back to the original processing. */ 

end case; 

end procedure RP_CNL_ANOTHER_BE_MSG ; 



procedure PROCESS_BE_TARGET (input: message; 

output: sender, reguest_id 
LAST_RECORD) ; 

y :Qc :9c :9c:9c :(c:9c:0c:9c:9c:0c:9c:9c:flc:9c:9c:4c:i0c*^*:9c**:ftc39c:9c:^: + :9c:^:9c:flc^:flc4f 

♦ This procedure is called to process the message * 
which contaifls the hashed bucket information of 
the non-local target records. 

This procedure will return the sender of the 
message^ the request id of those non-local 
records and a boolean variable^ LAST_RECORD, to 
indicate that all of the target records from the * 



* 

:5c 

39^ 

59^ 

* 

* 

* 

* 

* 

* 

* 

:9c 

* 

* 



sending backend have been received. 

Data structures and variables used in this 
procedure are: 

1. lAST^RECCRD: A boolean variable which is 

used to indicate the end of 
this request. 

2. message: A character string which is used 

to store the hashed results of 
target records and is sent from 
the other backends. 



* 

♦ 

* 

♦ 

* 

* 

* 

* 

:9c 



ic :9c :Jt * :9c:9c:9c :<c : 0 c :9c 39c ^9^ ^9^ ♦ ♦ ♦ 4^ 39c :Jc :9c :9c :{c :9c :9c :9c :9c :?c :?c ♦ * :9::9::9c:9c:9c*#::jc:9c:9c*39c:9c4c4c3<::9c/ 



/* G€t the sender of the message. */ 
perform GET_MSG_SENDER (sender) ; 

/♦ Get the request id of the request. */ 



perform GET_REQUIST_ID(request_id) ; 

/♦ Nov, check the global table to find the address ♦/ 
/♦ of the hashing table for this request. */ 
perform CHECK_G1CE AL_TABLE {request_id, hash_table, 

NER_REQOEST) ; 



NEW_EECORD = true; 

/* Since the message is an array of characters, ♦/ 
/* we have to bypass the header to get the record ♦/ 
/* information. If this message is the last message */ 
/♦ of the sending backend, then there will be an ♦/ 
/♦ end-of-reques t marker, ZORequest, in the front */ 
/* of the end-of-nessage marker. ♦/ 



I = the_integer_which_stands_f or 
_the_index_where_recor d_start ; 

/* Gets the bucket_numbers and their associated ♦/ 

/* records from the message, then insert them into */ 
/* correct buckets of the hashing table. */ 



while ( (not end cf message) or (not end of request) ) do 
perform GET_BUCKET_ND MBER (message, I, bucket_value) ; 
/♦ Get the bucket number of the record and the */ 
/* record itself from the message, and then */ 

/* store the record into the appropriate bucket ^-/ 

/* of the hashing table by using the ♦/ 

/* bucket number. */ 

perform GET_A_RECORD_SET (message, I, set) ; 
perform STOR E_RECORD_IN_BASH_TABLE (hash_table, 

bucket_number, set, NEW_RECORD) ; 

NER_RECORD = false; 
end while; 
if EORequest 

then LAST_RECORD = true; 
else LAST_RECORD = false; 
end if; 

end procedure PROCESS_BE_TARGET ; 
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procedure STAET_TO_HZEGE (input : reguest_id) ; 

* This procedure is called when the target record * 

* set has been received from all of the other * 

* backends. * 

* The input reguest_id is the request id of the ♦ 

* target request. * 

* The data structures and the variables used in * 

* this procedure are; * 

* 1. TAF.GET_TAE1E : The hashing table for the * 

* target request. * 

* 2. SOUSCE_TA£LE : The hashing table for the * 

* source request. * 

* 3. target_id: The request id of the target * 

* request. * 

* 4. source_id: The request id of the source * 

* request. * 

target_id = request_id; 

/* Get the source request id. V 

perfcrm GET_SOUBCE_ID (target_id, source_id) ; 

/* Get the hashing table of the source request. */ 

perform CHECK_GLOEAL_TABLE (source_id, global_table 

source_hash_table, 
NSW_REQ0E3T) ; 

/* Get the hashing table of the target request. */ 

perfcrm CHECK_GLOEAL_TABLS (target_id, global_table 

target_hash_t able , 
NEW_RZQ-JESI) ; 

/* Merge the records of these two requests and send */ 

/♦ the results to the controller. ♦/ 

perform MERGE (source_id, source_hash_table. address 
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target_has h_tahle. address) ; 
end procedure START_IC_HERGE ; 



procedure GET_SOURCE_ID (input: reguest_id; 

output : reguest_id) ; 

y4f***4**** 

* This procedure is used to find the request id for * 

* the source request by using the request id of the ♦ 

* target request. ♦ 

* Recall that the source request and the target * 

* request has the same traffic id, the difference * 

* between them is that the request number of the * 

* source request is less than that of target * 

* request by 1. * 

end procedure GET_SODECE_ID; 
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procedure CHECK_RECEIVE_MSG (input: sender, request_id; 

output: ALL_RSCEIVED) ; 



* This procedure is used to check whether all * 

* of the non-local target records have been * 

* retrieved from all of the other backends for * 

* a particular request. If all of the non-local * 

* target records have been received, then ♦ 

* A1L_RECEIVED is set to true. Otherwise, * 

* AIL RECEIVED is set to false. * 



end procedure CHECK_EICEIVE_HS G; 



procedure CHECK_GLOBSI_TABLE (input : reguest_id; 

output: hash_table, 

NEW_REQOEST) ; 

:11c ^^ ******* ****************** ********* * 

* This procedure is used to check whether a reguest ♦ 

* is a new request by checking if the request id is * 

* in the global table. If the id is found, then set * 
=♦ the value of NIW_REQUEST to false and return the * 

* NEW_VALDE and the hash_table of of the request. ♦ 

* This procedure has been defined in HASH_FONC(). * 
****************************************************/ 

end procedure CHECK_G10BAL_TABLE ; 
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procedure GET_BUCKET_KDMBEE (input : message, index; 

output; index, bucket_number) ; 

;Qc a^c sQc ajc a}c :4c ^ 30c :0c ajc aQc :fe 4c ajc ^c^c^c^c^c^c*** 

* This i^rocedure is used to extract the bucket * 

* numbers from the message, then return the * 

* tucket_number and the incremented index to its * 

* caller. ♦ 

* Data structures and variables used in this * 

* procedure: * 

* 1. bucket; A character string representation * 

* of the bucket number. ♦ 

* 2. j; A general purpose index. * 

j = 0; 

repeat 

bucket[j] = message[ index ]; 
index = index+ 1 ; 

: = 3 + 1 ; 

until message^!] = EOV; 

perform STRING_TC_INTEGER (tucket , bucket_number) ; 
end procedure GET_BOCKET_NUMBEE; 
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procedure GET_A_RECOED_SET (input: message, I; 

output; set) ; 

* This procedure is used to extract the common * 

* attribute value of a record and the record itself* 

* from the message which contains the hashed bucket* 



♦ information of the non-local target records. * 

* * 

* The data structures and the variables used in * 

* this procedure are; * 

* 1. set; A array which contains the common * 

* attribute value of a record and the * 

* record itself. * 

* 2. j; A general purpose index. * 



J = 0; 
repeat 

set[J] = message[I]; 

I = 1+1; 

*] = J + 1 ; 

until message[I-1] = EORecord; 
end procedure GET_A_EECOED_SET ; 
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APPENDIX D 

IHE HASHING PBOCEDOEE PROGRAM SPECIFICATIONS 

Procedure HASH_FDNCTICN (input: request_id, result, length; 

output: reguest_id, hashed_result, 
length_hashed_result) ; 



* The purpose of this procedure is to hash the value ♦ 

* of the join attribute into a bucket of the hash ♦ 

* table. ♦ 

* A hash buffer is reserved to store the hashed ♦ 

* results. * 

* Data structures and variables used in this * 

* procedure are: ♦ 

* 1. hash_buffer: A variable of the data type * 

* hashing_buf f er which is used ♦ 

* to stored the records and their ♦ 

* hashed bucket values, and is * 

* defined in hashing_module.def . ♦ 

* 2. RP_rid_irif o: The information for a request. ♦ 

* This structure is defined in * 

* the commdata. def file. * 

* 3. RP_rid_ptr: A pointer to the data structure * 

* of type RP_rid_info. ♦ 

* 4. req_tbl_ptr: A pointer to a request table. ♦ 

* The request table is defined in * 

* the commdata. def file as a ♦ 

* REQtbl_def inition structure. * 

* 5. temp_entry: A variable of data type rt_ntry ♦ 

* which is defined in commdata . def . * 

* 6. tem_ptr: A pointer to temp_entry. * 



* 7. rt_enrty: A pointer to a field of RP_rid_info.* 

* The type of this field is rt_ntry. ♦ 
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/* Check if the request id is a new request. */ 

if new request 
then 

/* Get the record template to find the value V 
/* type (i.e., integer, string or float) of the */ 
/* common attribute value. ♦/ 

perform FIND_RP_rid_info (request_id, RP_rid_ptr) ; 

/♦ Get a pointer to the request table from the V 
/* RP_rid_info. ♦/ 

req_tbl_ptr = RP_rid_ptr -> RP_ri_req; 

/* Find the attribute name from 
the request table. */ 

perform FINE_COMMON_AITRIBOTE (req_tbl_ptr, 

at tribute_naine) ; 

/* Get a pointer to the entry */ 

/* of the template for the common attribute. */ 
tem_ptr = RP_rid_ptr -> RP_ri_tmpl_ptr -> rt_entry; 
/* Get the value type of the common attribute */ 
/♦ from the record template. */ 

if tem_ptr->temp_entry. value_da ta_type = ’s' 
then 

value_type = string; 
else 



/* If the value type is integer, then */ 

/* we decide which hashing function to */ 

/* use. */ 

MAX = tem_ptr. value_c 1 ; /* The possible */ 

/* maximum value */ 

/* for this */ 

/* attribute. */ 

MIN = tem_ptr. value_c2 ; /* The possible */ 

/* minimum value */ 

/* for this */ 
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/♦ attribute. */ 

if (MAX-MIN) < the_number_of ^buckets 
then 

value_type = small_integer 
else 

range = (HAX-HIN) / the_number_of_buckets ; 
value_type = large_integer ; 
end if; 
end if; 
end if; 

/♦ Allocate a buffer to store the hashed results. */ 
perforin ALLOCATE_HASH_BUFFEE (Hash_buff er) ; 

/* Note: we may not want to call this */ 

/* routine at this point. ♦/ 

switch (value_type) 
case string; 

perform 3TRING_HASH (result, 

hash_buffer) ; 

case small_integer ; 

perform SMAL1_INTEGER_BASH (result , IIIN 

hash_buffer) ; 

case large_integer : 

perform LARGI_INTEGER_HASH (result , MIN, 

range , 

hash_buffer) ; 

end switch; 

end procedure HASH^FDSC; 
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procedure riND_COHMON_ATTRIBlJTE (input; request table; 

output; attribute name) ; 

* This procedure is used to find the name of the * 

* join attribute. * 

* The join attribute is the first attribute of the * 

* target list, sc we can just go to the entry * 

* where the target list begins and extract the first* 

* attribute name and then return it to the calling * 

* procedure. * 

end procedure FIND_CCHHON_ATTRIBOTE ; 

procedure ALLOCATE_BUfFER (input: reguest_id; 

output: hash_buffer) ; 

:Jc :jc :^c * :(e ♦ * :Jc * :^c :^c :Jc :Jc :jc ?}c :#c :{c 

/* This procedure is used to allocate a buffer for */ 
/* storing the records and their hashed bucket number,*/ 
/* set the length of the buffer to 0, and then */ 

/* return the buffer to the calling procedure. */ 

/♦ */ 

/* The data structures and the variables used in */ 

/* this procedure are: */ 

/♦ 1. hash_buffer: */ 

/* A variable of the data type hashing_buf f er , */ 

/* which is defined in hashing_module.def */ 

/* (see Appendix G) . */ 

/* 2. H3_ptr: */ 

/* A pointer to the hash_buffer. */ 

/* 3. HB id: */ 



/* 


A field name 


of the 


hash_buffer that 


*/ 


/♦ 


contains the 


re guest 


id of the records 


*/ 


/* 


which belong 


to this 


buffer. 


*/ 



:4c:9m9c :Qc :«e :9c :(c :^e 3|c ^cfc 4: ^ ^ 4^ * :^c4c:5c:^c4c:^;Jc:^c4c:^c:fle4c:^4c^:^c/ 

HE_ptr = allocate the hash buffer; 

HE_ptr->HB_id = reguest_id; 

HE_ptr->length = 0; 
end procedure ALLOCA 1E_BUFFER ; 



procedure STFING_HASH (input: result buffer, h_buffer) ; 



* This procedure is called when the value type * 

* of the common attribute is a character string. * 

* It performs the following tasks: * 

* 1. Extract records from the input result buffer ♦ 

* one at a time. * 

* 2- Extract the value of the join attribute * 

* from the extracted record and then check the * 

* lookup tatle to get the bucket number for * 

* the record. * 

* 3. Store the bucket number and the record into ♦ 

* a reserved hash buffer, h_buxfer. * 

* 4. If the hash buffer is full, then send the * 

* hash buffer to Bucket-block tracking * 

=♦ procedure. * 

* ^ 

* Data structures and variables used in this ♦ 

* procedure are: * 

* 1. a ttribute_value: A character-string * 

* representation of the common * 

* attribute value. ♦ 

* 2. record: A character-string representation * 
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♦ 

♦ 

♦ 

* 

♦ 

♦ 

♦ 

* 

* 

♦ 

♦ 

♦ 

♦ 

* 

♦ 

♦ 

♦ 

* 

♦ 

♦ 

* 

<t 

♦ 

♦ 

♦ 



of the extracted record. 

3. bucket_numfcer: The bucket number where the 

record characterized by the 
common attribute value is 
hashed into. 

4. tucket: A character-string representation 

of the bucket^number. 

5. EOV: The end-of-value marker. 

6. SON: The erd-of-name marker. 

7. SOB: The end-of-buf f er marker. 

8. LAST_BECOSE: A boolean variable to indicate 

that this record is the last 
record for the reguest. 

9. i: The index for the length of the result 

buffer . 

j: A general purpose index. 

10. lookup: The lookup table^ which is an array 

with 2048 character-string elements. * 

39c 
39c 
* 
* 
♦ 

11. h_buffer: A variable of type hash_buffer * 

which is defined in * 

hashing_module. def {see Appendix G) * 
and is used to store records and * 
their hashed values. * 



0 


abal 


1 


abc 


• 

• 

• 


2047 


zy th 


buffer ; 


: A variable 



:Jc:4c«35t#^:5e:Cc39c«3{c:«c:e::5c«:(c:ec35c35t*«39c39c:5c:«c:9c39c39c39c:flc:5c^:5c35::5c4c39c«39c:5t3jt:5c3}c:«c:«C39c***«:^4c/ 
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/* Get the lookup table. ♦/ 

i = 1; 
j = 0; 

LAST_RECOED = false; 

/♦ Get records frcm the result buffer one at a time. V 
while result_buf f er£ i ] <> EOB do 

/* Bypass the name of the common attribute. */ 

while result_buf f er[ i ] <> EON do 
i = i+1; 

end while; /* New, result_buf f er[ i ] = EON. */ 

i = i+1; 

/* Get the value of the join attribute. ♦/ 

While result_buf fer[i ] <> EOV do 

attribute_value[ j ] = result_buff er[ i ]; 
i = i+ 1 ; 
j = 3+1; 



end while; /* New, result_buf f er[ i ] = EOV. */ 

/* Compare the eommon attribute value with */ 
/* the contents of the lookup table to get the */ 
/* bucket-number. */ 



bucket_numbers = BI_S EAECH (lookup, attribute_number) ; 
perform NUMBEE_TO_STEING (bucke t_number, bucket) ; 

/* Add a EOV marker to the end of 
the attribute value. */ 
attribute_value£ j ] = EOV 
/* Extract records from the buffer. ♦/ 
i = i+1; 

3 = 0; 

repeat 

record[ j ] = result_buf f er[ i ]; 
i = i+1; 

3 = 3 + 1 ; 

until result_buffer[ i- 1 ] = EORecord; 

/* New, record£j] = EOEecord. */ 
if result_buf fer[ i ] = EOBeguest 
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then 

LAST_RECORE = true; 
i = i+1 ; 
end if; 

/* Store the hashed information into the 
hash buffer, h_buffer. */ 
perform PDT_HASH_BUFFER (h_buf f er , bucket, 

attribute_value, record, 
IAST_RECORD) ; 

end while; 

end procedure STRING_HASH; 



procedure PUT_HASH_BUFFER (input : h_buffer, 

bucket 

attribute_value, record, 
LAST_RECORD; 
output: h^buffer) ; 

♦ This procedure is used to store the hashed * 

♦ record information into the hash_buffer. * 

♦ 

* Data structures and variables used in this * 

* procedure are: * 

* 1. X,Y,Z,i,j,K: General purpose indexes. * 

♦ 2. MAX: The predefined maximum len'gth of the * 

♦ hash buffer. * 

♦ 3. bucket: A character-string representation ♦ 

♦ of bucket_number. * 

* 4. record: The input record which is in the * 

♦ form of character string. * 

* 5. LAST RECCRD: A boolean variable which is * 
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* used to indicate the end of ♦ 

♦ this request. * 

♦ 6. h_buffer: A buffer which is used to store * 

* records and their hashed values. * 

/♦ Check to see if the buffer has enough space for */ 
/* the new record. */ 

X = String_len (bucket_number) ; 

Y = String_len (attribute_value) ; 

2 = String_len (record) ; 

K = the_current_length_o£_the_hash_buf fer; 
if (K + X ♦ Y ♦ Z) > MAX 
then 

/* The buffer is full, so it is send to the V 
/♦ bucket-blcck tracking procedure. ♦/ 

perform BaCKET_BLOCK (h_buff er) ; 

/* Reset the length of the buffer to 0. ♦/ 

K = 0; 
else 

/* The buffer has enough space, so store the */ 

/♦ input record into the buffer.*/ 
for i = 1 tc X do 
K = K + 1; 

hash_res ult£ K ] = bucket[i]; 
end for; 

for i = 1 to Y do 
K = K + 1; 

hash_result[ K ] = attribute_value[ i ]; 
end for; 

for i = 1 to Z do 
K = K ♦ 1; 

hash_result£ K ] = recordfi]; 
end for; 

/* If this is the last record of this request, */ 
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*/ 

*/ 



/* then send the hash_buffer to the 
/♦ bucket_blcck tracking procedure, 
if LAST_RECOED 
then 

hash_result[ K+ 1 ] = EOP.eguest; 
hash_result[K+2 ] = EOB; 
perform BUCKET_BIOCK (h_buf fer) ; 
perform EEEE_EUFEER_SPACS (h_buf fer) ; 
end if; 
end if; 
end ; 

end procedure PUT_HASE_BOFFER; 



procedure SMALL_INTEGIR_HAS H (input : result_buf fer , 

MIN, 

h_buf f er ; 
output : h_buf fer) ; 

* This procedure is used when the type of the * 

* common attribute value is integer and when the ♦ 

* difference of the maximum and minimum value of ♦ 

=» the common attribute value is less than the * 

* number of the buckets of the hashing table. * 

* It performs the following tasks: * 

* 1. Extract records from the input result buffer * 

* one at a time. * 

* 2 . Extract the value of the common attribute from* 

* the extracted record and then calculate * 

=» the bucket number. * 

* 3. Store the bucket number and the record into * 
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* a reserved hash-buffer. * 

* Data structures and variables used in this * 

* procedure are: * 

* 1. attribute_value: A character-strir. j * 

* representation of the common * 

* attribute value. * 

* 2. record: A character-string representation * 

* of the extracted record. * 

* 3. bucket_nuniter : The tucket number where the * 

* record characterized by the * 

* common attribute value is * 

* hashed into. * 

* 4. bucket: A character-string representation * 

* of the bucket_number. ♦ 

* 5. EOV : The end-of-value marker. * 

* 6. EON: The end-of-name marker. ♦ 

* 7. EOB: The end-of-buf f er marker. ♦ 

* 8. 1AST_EEC0EE: A boolean variable to indicate * 

* that this record is the last ♦ 

* record for the reguest. * 

* 9. i: The index for the length of the result * 

* buffer. * 

* j: A general purpose index. * 

* k: The index for the length of the attribute_ * 

* value. ♦ 

* 10. temp: An integer representation of the input * 

* attribute_value. * 

* 11. h_buffer: An variable of type hash_buffer ♦ 

* which is defined in * 

* hashing_module, def (see Appendix G) ♦ 

* and is used to store records and * 

* their hashed values. * 

/* Initialize the indexes. */ 
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i = 1; 
k = 1; 

3 = 0 ; 

1AS'I_FEC0RD = false; 

/* Get the records from the result buffer 
one at a time. */ 
while result_buf f er[ i ] <> EOB do 

/* Bypass the name of the common attribute. ♦/ 
while result_buf f er[ i ] <> EON do 
i = i+1; 

end while; /* How, result_buf f er[ i j is EON. ♦/ 
i = i+1; 

/* Get the value of the common attribute. */ 
while result_buffer[ i ] <> E07 do 

attribute_value[ k ] = result_buf f er£ i ]; 
i = i+1; 
j = j+1; 

end while; /* How, result_buff er[ i ] is EOV. */ 

/* Compute the tucket number. */ 

ferform STRING_'IO_NOHBER (at tribute_value. Temp) ; 

bucket_number = Temp - MIN; 

perform NUMBEP._'IO_STRING (bucket_number, bucket) ; 

/* Add a EOV marker to the end of attribute value. */ 
attribute_value[j ] = EOV 

/* Get the attribute-value pairs of the actual ♦/ 

/* target list of the record. */ 
i = i+1; 
j = 0; 

repeat 

record^ j ] = result_buf fer[ i ] ; 
i = i+1; 
j = j+1; 

until result_buffer[ i- 1 ] = EOEecord; 

/* Now, record£j] is EORecord. */ 
if result_buf f er[ i ] = EOReguest 



119 



then 

1AST_EEC0EC = true; 
i = i + 1 ; 
end if; 

/* Store the hashed information into the h_buffer. V 
perform PUT_HASH_BUrFEE (h_buff er, bucket, 

attribute_number, record, 
LAST_EECOED) ; 

end while; 

end procedure SriALL_IKTEGEE_HASH; 



procedure LAEGE_INTEGEE_HASH (input : resu lt_buf f er , 

illN, range, 
h_buf f er ; 

o utput : hash_buf fer) ; 

* This procedure is used when the type of the * 

* common attribute value is integer and when the * 

* difference of the maximum and minimum value of * 

* the common attribute value is greater than the * 

* number of the buckets of the hashing table. * 

* It performs the following tasks: * 

* 1. Extract records from the input result buffer * 

* one at a time. * 

* 2. Extract the value of the common attribute from"* 

* the extracted record and then calculate * 

* the bucket number. * 

* 3. Store the bucket number and the record into * 

* a reserved hash-buffer. * 

* Data structures and variables used in this * 

* procedure are: * 

* 1. at tribute_value: A character-string * 

* representation of the common * 
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attribute value. * 

2. record: A character-string representation * 

of the extracted record. * 

3. bucket_nuniter: The tucket number where the * 

record characterized by the 
common attribute value is 
hashed into. 

4. tucket; A character-string representation 

of the bucket_number . 

5. EOV: The end-of-value marker. 

6. EON; The end-of-name marker. 

7. E03; The end-of-buf f er marker. 

6. LAST_EECORD; A boolean variable to indicate 

that this record is the last 
record for the request. 

9. i; The index for the length of the result 

buffer. 

j; A general purpose index, 
k: The index for the length of the attribute_ * 
value. * 

10. temp; An integer representation of the input * 

at tribute_value. * 

11. h_buffer: An variable of type hash_buffer * 

which is defined in * 

hashing_module. def (see Appendix G) * 
and is used to stcre records and * 
their hashed values. * 



4i-^i^-^t************************** *^ ********** '^^* ****** / 

/* Initialize the indexes. */ 

i = 1; 
k = 1; 

3 = 0; 

1AST_EEC0RD = false; 

/* Get records frcm the result buffer one at a time. V 



while result_buf f er[ i ] <> EOB do 

/* Bypass the name of the common attribute. */ 
while result_buffer[ i ] <> EON do 
i = i+1 ; 

end while; /* Now, result_buf f er[ i ] is EON. */ 
i = i+1; 

/♦ Get the value of the join attribute. */ 
while result_buff er' i ] <> EOV do 

attribut e_value[ k ] = result_buf f er[ i ]; 
i = i+ 1 ; 
j = j+1; 

end while; /♦ New, result_buf f er[ i ] is EOV. */ 

/* Compute the tucket number. ♦/ 
perform STRING_TO_NUHBER (attribute_value. Temp) ; 
bucket_value = TRONC[ (Temp - MIN) /range]; 
perform NUHBER_IO_STSING (bucket_value, bucket) ; 

/♦ Add a EOV marker to the end of attribute_value. */ 
a ttribute_number[ j ] = EOV 

/* Get the attribute- value pairs of the actual V 

/=* target list of the record. */ 
i = i+1; 
j = 0; 

repeat 

record£ j ] = result_buf fer[ i ]; 
i = i+1; 
j = j+1; 

until result_buff er[ i- 1 ] = EORecord; 

/♦ New, record[j] is EORecord. */ 
if result_buf f er£ i ] = EOReguest 
then 

LAST_RECORE = true; 
i = i+1; 
end if; 

/♦ Store the hashed information into the h_buffer. V 
perform PUT_HASH_BUFFER (h_buf f er, bucket. 
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attribute_n umber. 



erd while; 



1AST_HEC0HD) ; 



end procedure LARGE_INTEGER_HASH; 



record. 
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APPENDIX B 

TBE EDCKET-BLOCK-TBACKING PBOCEDORE PROGRAM SPECIFICATIONS 
procedure BUCKET_BLOCK {input: E_fcuffer) ; 

* This procedure receives a hash buffer, H_buffer, ♦ 

* from the ret_ccm subfunction and performs the * 

* fcllowing task. * 

* 1. Establish and maintain a global table to * 

* store the addresses of the hashing tables * 

* of all the requests. * 

* 2. Extract the hashed record information from * 

* the input hash_buffer. * 

* 3. Check the global table to see if the input * 

* records belong to a new request. If they do, * 

* then allocate a new hashing table. * 

* Otherwise, get the logical address of the * 

* hashing table from the global table and * 

* assign a pointer to the hashing table. * 

* 4. Group records into the buckets according to * 

=» their bucket numbers and store them into * 

* blocks. * 

* 5. Broadcast the bucket information of the local * 

* target records to the other backends. * 

* 6. Store the hashing table back to the secondary * 

* storage. * 

* * 

♦ Data structures and variables used in this * 

♦ procedure are: * 

♦ ♦ 

* 1. FIRST_RET_COM : * 

* A boolean variable which is set to ♦ 

* true when the first retrieve common * 
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* request enters the system. * 

* ♦ 

* 2. GT_ptr; * 

* A pointer to a global table. * 

* 3. G_table: * 

* A variable of type global table (see * 

* Appendix G) . ♦ 

* ♦ 

* 4. HT_ptr; ♦ 

* A pointer to a hashing table. * 

* 5. HT: ♦ 

* A variable of type Hash_table (see * 

* Appendix G) . * 

* ♦ 

* 6. HB_ptr: * 

* A pointer to a hash buffer. * 

* 7. H_buffer; * 

* A variable of type hash_buffer (see * 

* Appendix G) . * 

* * 

* 8. NEW_REQDE£'I: * 

=* A boolean variable which is set to * 

* true if the request id cannot be found * 

* in the global table. * 

* 9. logical_addr : * 

* A variable of type addr_def inition, * 

* which is defined in the commdata.def file. * 

* 10. buck et_n umber : * 

* The bucket number where the record * 

* characterized by the attribute value is * 

* hashed into. * 

* 11. bucket: ♦ 

* A character-string representation of * 

* the bucket_number. * 

* 12. reg_id: * 
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A record which contains the traffic id and * 



* 

♦ request number of a request. ♦ 

♦ 13 . i, j : * 

♦ General purpose indexes. ♦ 

if FIHS'I_EET_COH 
then 

perform INITIA1IZE_GL0BAL_TABLE (GT_ptr) ; 

FIBST_EET_COM = false; 
end if; 

/* Get the request id from the pointer of which */ 

/* points the input hash buffer. ♦/ 

request_id = H_buff er . Eequest_id; 

/* Check the global table to see if this request is */ 
/* a new request. */ 

perform CHECK_GLCEAL_TABLE (GT_utr, req_id, 

logical_addr, NEW_EEQOES’I) ; 

if NF17_HEQOE3T 
then 

perform ALLOCATE_HASH_TABLE (logical_addr) ; 
perform INS EET_GLOBAL_TABLE (GT_ptr, req_id, 

logical_addr) ; 

end if; 

perform GST_HASHING_TABLE (request_id, 

logical_addr, HT) ; 

/* Now, the hashing table is ready to store records. ♦/ 



/♦ Extract the record information from the */ 
/* hash buffer ore record at a time. */ 
/* Because the last two character of the hash buffer ♦/ 
/♦ are the EOEequest marker which indicates whether */ 
/* this is the last hash buffer for this request ♦/ 
/♦ and the EOBuffer marker which indicates the ♦/ 
/* end of this hash buffer, the actual length of the */ 
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/* hash buffer is length-2. V 

j = 1; 

while j < (H_buffer. length-2) do 
/* Get the bucket number. */ 
i = 0; 
repeat 

hucket[i] = H_buf fer . Hashed_result[ j ] ; 
i = i + 1 ; 

3 = 3 * 1; 

until H_bu£f er . Hashed_result[ j ] = EOV; 

/* Convert the tucket number from a character to */ 
/* an integer. V 

buck€t_number = STRING_I0_INTSG2R (bucket) ; 

/* Get the common attribute value and the record */ 
/* itself. ♦/ 

j = j + 1; 
i = 0; 
repeat 

ccmmon_and_record[ i ] = Hash_buffer.HB_buffer[ j ]; 
i = i + 1; 

3 = 3 * 1; 

until common_and_record [i - 1] = EORecord; 

/* Store the record and its common attribute value */ 
/* into the hashing table. */ 

perform STOEE_EECORD_I H_HASH_TABL5 (HT, buc ket_numter, 

common_and_reccrd, 
NEH_RECORD) ; 

NEW_RECORD = false; 
end while; 

/* Check if this is target request */ 

if MOD (reg_id. reguest_no, 2) = 0 
then 

/* This is a target request. */ 
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perform BROAECAST_TARGET_INFO (HT) ; 
end if; 

perform STOR E_BACK (HT, logical_addr) 
end procedure BOCKET_ELOCK; 



procedure INITIALIZE_GLOBAL_TABLE (output: GT_ptr) ; 

* This procedure is used when the first retrieve- * 

* common request is executed in the BDCKET_BLOCK * 

* procedure. ♦ 

* This procedure creates a global table and * 

* returns the pointer (GT_ptr) to the table to * 

* the calling procedure. * 

end procedure INITIAIIZS_GLOBAL_TABLE; 

procedure ALLOCATE_H A£H_TABLE (out put : logical_addr) ; 

* This procedure is used to allocate a hashing ♦ 

* table for a new retrieve-common request from * 

* a predefined secondary storage area and return * 

* the logical disk address to the calling * 

* procedure. ♦ 

* The bucket entries are also initialized. * 

:9c ****** ** *********** *****************************/ 

end procedure ALLOCATE_HASH_TAELE; 
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procedure CHECK_GL0BA1_TA3LE {i nput : GI_ptr, reguest_id; 

output: logicdl_addr, NEK_REQOEST) ; 

* This procedure is used to check whether a request * 

* is a new request by checking its request id * 

* against the global table. If the request id is * 

* found in the global table, then set the value of * 

* NEW_EEQDEST to false and return the logical disk * 

* address of the hashing table to the calling * 

* procedure. Otherwise, return the NEW_REQOEST * 

* hack to the calling procedure. ♦ 

end procedure CHECK_GIOBAL_TABLE ; 



procedure INSERT_GLOEAL_TABLE (input : GT_ptr, Reg_id, 

logical_addr ; 
output: GT_ptr) ; 



* This procedure is used to insert a new hashing * 

* table into the global table. * 

* ♦ 

* Data structures and variables used in this * 

* procedure are: * 

* 1. GT_ptr: * 

* A pointer to the global table. * 

* 2. Req_id: * 

* The request id of the records of the new * 

* hashing table. * 

* 3. logical_addr : * 

* The logical disk address of the new hashing * 

* table. * 
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♦ ♦ 

♦ An inverted list implementation to maintain the * 

* taile is reccirmanded. * 

^ ♦ jJi + :flc :^c :}c :fl: :«c * * * * :?c :!c * * :}c * * :?c * :?c ijc ;^r :flc * >J: 4: :Jr :j; 3}c ^:}c 

end procedure INSEST_GLOBAL_TABLE ; 



procedure £T0RE_P.EC0Er_IN_HA3H_TABLE 

(input: HT, buckct_number , 
info, NEW_RECORD) ; 

♦*5!c:Jc**){s****^* 4******<'4<****^***********4'*!»**!!'« it* 



♦ This procedure is used to store the common ♦ 

♦ attribute value ox a record and the record itself * 

♦ into a hashing table. * 

♦ Recall that the records are stored in blocks. ♦ 

♦ ♦ 

* Data structures and the variables used in this * 

* procedure are: * 

* 1. HT: * 

* A variable of type hash_table which is * 

* defined in hashing_module. def (see Appendix * 

* G) . * 

* 2. bucket_number: * 

* The bucket number where the record * 

* characterized by the common attribute value * 

* is hashed into. * 

* 3. info: * 

* A character string which contains the * 

* common attribute value of a record and the * 

* record itself. * 

* 4. NEW_RECOED: * 

* A boolean variable to indicate whether the * 
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* input info is a new record of this request * 

* id. * 

* 5. old_buc]cet_nuiaber ; * 

* The bucket_number of the previous input * 

* record. * 

* 6. bkt; * 

* A variable of type BUCKET_ENTRY which is * 

* defined in hashing_module. def (see Appendix * 

* G) . * 

* 7. blk_ptr : * 

* A pointer to a record block of type * 

* REC_BLOCK which is defined in * 

* hashing_niodule. def (see Appendix G) . * 

* 8. blk, blk_2: * 

* Variables of type EEC_BLOCK which is defined * 

* hashing_niodule. def (see appendix G) . * 

* 9. I: * 

* An integer variable. * 

* 10. MAX_BLCCK_SIZE: * 

* An integer that represent the maximum * 

* length of the block content. * 

if NZW_RECOED 
then 

/* This record is the first input record of this */ 
/* request. */ 

perform GET_THE_BOCKET (HT, bucket_number , bkt); 
perform A1LCCATE_REC_EL0CK (blk) ; 
perform MODIPY_ENTRY_5_HEADER (bkt , blk) ; 
else 

/♦ Compare the input bucket_number with the 
previous one. */ 

if bucket_number <> old_bucket_number 
then 
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perforiD STORE_BACK (blk) ; 

/* Get the desired bucket entry for this 
input record. */ 

bkt = HI. bkt_entries[ bucket_nu!Dber ]; 

/♦ Check if the bucket is empty. */ 
if bkt. status = empty 
then 

perform ALLOCATE_EEC_BLOCK (blk , addr) ; 
perform MODIFY_ZNTRY_E_HEADEE (bkt, 

blk , addr) ; 

els e 

/* Get the record block by the address ♦/ 

/* in the bucket entry.*/ 

perform GET_REC_BLOCK (bkt . block_address, 

blk) ; 

end if; 
end if ; 

/* Check if the block has enough space to */ 

/* store this record. ♦/ 

I = S'TRING_IENGTH (info) ; 

if (blk. header. length + I) > HAX_BLK_SIZE 
then 

/* This block does not have enough space */ 

/♦ for this record. */ 

perform ALLOCATE_RECORD_BLOCK (blk_2, 

addr_2) ; 

perform MODIF Y_ENTRY_5_HEADER (bkt, 

blk_2, 
addr_2) ; 

/* This routine will also modify ♦/ 
/* the header of blk_2. ♦/ 
perform STORE_BACK (blk) ; 
blk = blk_2; 
end if; 

end if; 
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perform STOR E_INIC_IN_B LOCK (inf o, blk) ; 
end procedure STORE_EICORD_IN_HASH_TABLE; 



procedure STOR E_BACK (input: A_structure) ; 

/#♦*:»** **********:»***:jt*****4!*:jt***#***;«c*****«**»^:*«**<c 

* This procedure is used to store a hashing table, * 

* or a record block back to the secondary storage. * 

4 * 

* A_structure is a variable which may be either * 

* a hashing table or a block. * 

end procedure STORE_EACK; 



procedure GET_REC_BLOCK (input: logical_addr ; 

output : blk) ; 

* This procedure is used to bring a block of memory * 

* from a predefined secondary storage area into the * 

* primary memory by its logical address. * 

* Data structures and variables used in this * 

* procedure are: * 

* 1- logical_addr * 

* The logical address of a block. * 

* A variable of addr_def inition which is ♦ 

* defined in the commdata.def file- * 

* 2. blk * 

* A variable of type REC_BLOCK which is defined* 

* in the hashing_module. def (see Appendix G) - * 

end procedure GET_REC_BLOCK; 
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procedure STOBE_INFO_IN_BLOCK (input : info, blk) ; 



♦ This procedure is used to store the common * 

♦ attribute value of a record and the record * 

♦ itself into a block. * 

♦ It is called only when the block has enough * 

♦ space for that information, i.e. , info. * 

♦ Data structures and variables used in this * 

♦ procedure are: * 

♦l.info: * 

♦ A character string which contains the ♦ 

♦ common attribute value of a record and * 

♦ the record itself. * 

♦ 2. blk; * 

♦ A variable of type REC_BLOCK which is * 

♦ defined in hashing_module.def (see * 

♦ Appendix G) . * 

♦ 3. i,j; ♦ 

♦ General purpose indexes. * 



i = 0; 

j = blk. header. length+1 ; 
repeat 

blk.contents[ j ] = info[i]; 
i = i+1; 
j = j+1; 

until i = STEING_IENGTH (info) ; 
end procedure STORE_INFO_IN_BLCCK; 
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procedure M0DIFY_ENTBy_6_HEADEB (input ; tkt, blk, 

blk_addr ; 

output: bkt, blk) ; 

* This procedure is used to modify the bucket * 

* entry of the input bkt and the header part * 

* of the input blk. It will then return these * 

* modified bkt and blk back to the calling * 

* procedure. * 

* 3?C 

* Data structures and variables used in this * 

* procedure; * 

* 1. bkt: * 

* A variable of type 3ucket_entry * 

* which is defined in hashing_module. def * 

* (see Appendix G). * 

* 2. blk: * 

* A variable of type REC_3L0CK which * 

* is defined in hashing_module. def ♦ 

* (see Appendix G) . * 

* 3. blk_addr * 

* A variable of type addr_def inition * 

* which is the logical address of a block * 

* and is defined in the commdata.def file. * 

:5c ajc ^ # A :{c 4c ♦ # ^Cc ♦ jJc ^Cc :«c # # :(c / 

blk. header. next_blk_addr = bkt. block_address ; 
bkt . tlock_ad dress = blk_addr; 
end procedure HODIFY_ENTRY_&_HEADE3; 
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procedure BROADCAST_'rARGET_IHFO (input: HT) ; 

♦ This procedure is used to broadcast the records * 

♦ of the target hashing table to the other * 

♦ lackends. ♦ 

♦ This is the same procedure that is used to * 

♦ broadcast the descriptor ids among backends. * 

♦ Data structures and variables used in this * 

♦ procedure are: * 

♦ 1. HT: * 

♦ A variable of type hashing_table * 

♦ which is defined in hashing_module. def * 

♦ (see Appendix G) . * 

♦ 2. i: * 

♦ A general purpose index. * 

♦ 3. MAX_BKT_#; * 

♦ An integer which is used to represent the * 

♦ maximum number of the bucket entries in a ♦ 

♦ hashing table. * 

♦ 4. bkt: ♦ 

♦ A variable of type Bucket_entry which * 

♦ is defined in hashing_module. def (see ♦ 

♦ Appendix G) . * 

♦ 5. msg: ♦ 

♦ A character string which is used to store * 

♦ the message that is to be broadcasted to all * 

♦ of the backends. * 

3)c:(c«:4c:(c4c4i:(c4c / 

for i = 1 to HAX_EKT_# do 

bkt = HT. bkt_entr ies[ i 2; 
if bkt. status <> empty 
then 

/* Put the bucket number into the message.*/ 
perform GET_EEC_B10CK ( bkt. block_address, blk) ; 
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repeat 

/* Extract the contents of the */ 

/* blk. content and copy them into msg.*/ 
if the msg is full 
then 

send msg to all of the backends; 
reset the length of msg to 0; 
end if; 

if blk. next_blk_address = blk. own_addres 
then 

/* This block is the last block for 
this bucket. */ 
last = true; 
until last; 
end if; 
end for; 

send the msg to all of the other backends; 
end procedure BROADCA£T_TARGET_INFO ; 
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4££2HDIX p 

THE MERGIHG PfiOCEDORE PROGRAM SPECIFICATIOHS 



procedure MERGE (input: source_regues t_id , 

logical_address_of_source_ table , 
logical_address_of_tar get _t able) ; 

y ^ :4c ^ 3 ^ :4c ^ :«e ^ 34e:4e:4e:Qc :4e sfe :9e :4c :4c 



♦ 

♦ 

♦ 

* 

♦ 

♦ 

♦ 

* 

* 

* 

* 

* 

♦ 

* 

♦ 

♦ 

* 

* 

♦ 

* 

* 

♦ 

3«C 

♦ 

* 

♦ 

♦ 

* 

* 

* 

♦ 

❖ 

* 

♦ 

♦ 

♦ 

3?e 



This procedure is used to perform the merging 
operation over the source records and the target 
records. 

Notice that the input addresses are the logical 
disk addresses of the two hashing tables. 

Data structures; and variables used in this 
procedure are: 

1 . logical_address_of_source_table, 
logical_a ddress_o f _ tar get _t able; 

The logical disk addresses of the source 



2 . 



3 . 

4 . 



♦ 
♦ 
* 
♦ 
* 
♦ 
♦ 
♦ 

Appendix G) that represents the source-hashing ♦ 

♦ 

i: A general purpose index. * 

max bucket number: ♦ 

- - 4c 

The largest bucket number of a hashing table. * 



commdata. def file. 
source_table, target_table: 

Variables cf hashing_table data type (see 



table and the target-hashing table. 






/* Retrieve the two hashing tables by the input */ 
/* logical addresses. */ 
/♦ Ncte: Due to the limited memory space, we may */ 
/* not be able to bring in the entire table. */ 
perform GET_HASH_IABLE (logical_address_of_source_table. 
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source_tabie) ; 

perforin GET_HASH_TABLE (logical_address_of_target_tatle, 

target_table ) ; 

/* Reserve a result buffer. ♦/ 

perform GET_BtJFriE (result_tuffer,source_reguest_id) ; 

/* This routine will allocate an instance of a 
result buffer and put the request id into the 
the header of the buffer and initialize the 
length of the buffer to 0. 

This routine has already been coded in 
the retp.c file- */ 

i = 0; 

while i < max_bucket_number do 

if [ (source_tahle. bucket_entry[i ]. status <> empty) 
and 

(target_ta tie. bucket_entry[ i ]. status <> empty) ] 
then 

/* There is a collision. */ 

/* Retrieve the records from both blocks and 
perform the merging operation. */ 

X = source_table. bucket_entry[ i ]. logical_address ; 
Y = target_table. bucket_entry[ i ]. logical_address; 
perform merging_operation (X, Y, result_ buffer) ; 

/* This routine will perform the merging 
operation and send the merged results 
to the controller. */ 

end if; 
i = i+1; 
end while; 

/* Signal PP upon the completion of the source and */ 
/* target request. */ 
end procedure MERGE; 
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procedure MERGING_0?IEATI0N 

(input; logicl_address_source_block, 
logicl_address_target_block , 
result_buf f er ; 
output; result_buf f er) ; 

y :9c :4c:^:0e4t:Qn4e:4e:Qc :«c :^c :9c :Qc :9c :9i :4c :9c :«c :9c ic :«e :«e :9c :«e :4c :9c :9c 4c :9c :9c :9c :9c :9c :9c ^t^c^c^c^c^c^c^c^c 

♦ Ihis procedure is used to perform the following ♦ 

♦ tasks: ♦ 

♦ 1. Extract the records from both of the source * 

♦ block and the target block. ♦ 

♦ 2. Compare the common attribute values ♦ 

♦ of the source and target records. ♦ 

♦ If they are equal, then perform the merging * 

♦ operation. ♦ 

♦ 3. Put the merged results into a result buffer. * 

♦ If the buffer is full, then send the buffer ♦ 

♦ to the controller and reinitialize the ♦ 

♦ buffer length to 0 so that the buffer can * 

♦ he reused. ♦ 

♦ Otherwise, return the logical address of the * 

♦ the result buffer to the calling procedure. ♦ 

♦ * 

♦ Data structures and variables used in this ♦ 

♦ procedure are; ♦ 

♦ 1. source_block, tar get_block: * 

♦ Variables of the data type BKT_BLK which * 

♦ are used to represent the blocks of the * 

♦ source hashing table or the target hashing * 

♦ table. * 

^ 3KT_BLK is defined in hashing_modu le. def ♦ 

♦ (see Appendix G). * 

^ 2. source_done, target_done: * 

♦ Boolean variables which are used to indicate * 

♦ the completion of processing either source * 

IhO 



* records cr target records. * 

* 3. i,j: General purpose indexes. * 

/* Continue retrieving the source blocks by the */ 
/♦ logical address, until there are no more blocks. */ 
repeat 

source_block = 

GET_BLOCK ( logical_address_source_block) ; 

/* Continue retrieving the target blocks by the */ 

/* logical address 'intil there are no more blocks.*/ 
repeat 

target_block = 

GET_ELOCK {logical_adiress_targe t_block) ; 

i = 0; 

while source_tlock. body£ i ] <> EOB do 

/* Retrieve one common attribute_value and one */ 
/♦ record from source block. */ 

source_valu€ = GET_VALUE (source_block. body, i ); 
source_record = GET_BECORD (source_block. body , i) ; 

J = 0; 

while target_block.bcdy[ j ] <> EOB do 

/* Retrieve one common attribute_value and */ 
/* one record from the target block. */ 

target_value = GET_VALUE (target_block. body , j) ; 
target_r ecord = 

GET_RECOED ( target_b lock. body , j) ; 
if source_value = target_value 
then 

/* Append target record at the end of */ 
/* source record and put the newly */ 

/* merged record into the result buffer.*/ 
result = APPEND (source_record, 

target_record) ; 

result_length = STRING_LENGTH (result) ; 
perform HB$POT_SEND (resul t_buf f er , 
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result, 

result_lengt h) ; 

else 

/* Go to the next target record. +/ 

J = J+1; 
end if; 

end while; /* End the target-record loop. */ 
i = 1+1; 

end while; /* End the source-record loop.*/ 

/* Are the target records done? */ 
if target_block. header . next_block_address = 
targe t_blcck. header . this_block_address 
then 

target_dcne = true; 
else 

target_tlock. header. next_block_address = 
targe t_block. header. this_block_address ; 

end if ; 

until target_dcne; 

/* Are the source records done? ♦/ 
if source_block. header. next_block_address = 
sourc e_block. header .t his_block_ad dr ess 
then 

sourc e_done = true; 
else 

source_blcck. header . next_block_address = 
source_block. header . this_block_ad dress; 
end if; 

until source_done; 
end procedure MERGING_OPEEATION; 
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AP£JNflIX G 

TBE HASHING HOCULE DATA STEOCTORE DEFINITIONS 



In this appendix we present the definitions of the data 
structures used in the previous appendices. He refer to the 
definitions as hashing_module. def. 

1. hash_huffer: 

This is the buffer which stores the hashed informaticn 
of records. 

— > The request id of 

the hashed records. 

— > The current length 

of the Hashed_results. 
— > An array of character 
string used for 
storing the hashed 
records. 

The format of the hashed_ resul ts is: 

{hashed_record_inf 0 } + EOReg EOB 
where 

hashed_record_inf o : : = bucket_number E07 £Rec} 

Rec = {att ribute_value_pair} *EOEec 
attribute_value_pair ; : = 

a ttribute_name EON attribute_value EOV 
'•+** means one or more occurence. 

ECB : A special character which is used as a marker 
for the end-of- buffer. 

ECV : A special character which is used as a marker 
for the end-of- va lue. 

ECN : A special character which is used as a marker 
for the end-of-at trib ute_na me . 

EOEec: A special character which is used as a marker 



P4eguest_id 



Length 



Hashed results 
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for the end-of-record . 

ECEeq: A character, either 1 or 0 , which is 
use to indicate the end of a request, 

1: end cf a request. 

0: not end of request, more buffers are coming. 

eEC_EICCX 

Blocks used by buckets to store the records and their 
common attribute values. 

A REC_E10CK is composed of a header two fields, 
and a contents. 

— > This part contains the status 
of this block. 

— > This part contains the records 

and their common attribute values. 



The format of the content of the ESC_BLOCK is: 

£Rec} +E0B 

The header contains two parts: 

— > An integer to indicate the total 
length of the records in this 
block. 

— > The logical address of the next 
block ox the same bucket. (If 
this block is the first block of 
the bucket, then a null aidress 
will be put in here.) 

The type of this field is 
addr ess_def inition and is 
defined in the commdata.def file. 



length 



next blk addr 



header 



contents 



3. Bucket_entry : 



— > A character which is either 1 fcr 
not empty or a 0 for empty . 
— > The logical address of the block 
of this bucket. 



4. Hash_table: An array of 2048 bucket_entries. 



status 



block address 
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